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Abstract 

Denoising by frame thresholding is one of the most basic and efficient methods for 
recovering a discrete signal or image from data that are corrupted by additive Gaussian 
white noise. The basic idea is to select a frame of analyzing elements that separates the 
data in few large coefficients due to the signal and many small coefficients mainly due to 
the noise e n . Removing all data coefficients being in magnitude below a certain threshold 
yields an approximation to the original signal. In order that a significant amount of 
the noise is removed and at the same time relevant information about the original 
signal is kept, a precise understanding of the statistical properties of thresholding is 
important. For that purpose we compute, for the first time, the asymptotic distribution 
of max we £2 n I (C,£n) | for a wide class of redundant frames (</>™ : u € f2„). Based 
on our theoretical results we give a rationale for universal extreme value thresholding 
techniques yielding asymptotically sharp confidence regions and smoothness estimates 
corresponding to prescribed significance levels. The results cover many frames used in 
imaging applications, such as redundant wavelet systems, curvelet frames, or unions of 
bases. We show that 'generically' a standard Gumbel law results as it is known from 
the case of orthonormal wavelet bases. However, for specific highly redundant frames 
other limiting laws may occur. We indeed verify that the translation invariant wavelet 
transform shows a different asymptotic behavior. 



Keywords. Denoising, thresholding estimation, extreme value analysis, Gumbel dis- 
tribution, Berman's inequality, wavelet thresholding, curvelet thresholding, translation 
invariant wavelets, extreme value threshold, frames, redundant dictionaries. 

* Corresponding author. E-mail address: ma.haltmeier@gmail.com 



1 



1 Introduction 



We consider the problem of estimating a signal or image u n in d independent spatial dimen- 
sions from noisy observations 

V n (A;) = u n (k) + e n (k) , for k G I n := {0, ... ,n - l} d . (1.1) 

Here e n (fe) ~ iV(0, <r 2 ) are independent normally distributed random variables (the noise), 
n is the level of discretization, and a 2 is the variance of the data (the noise level). The signal 
u n is assumed to be a discrete approximation of some underlying continuous domain signal 
obtained by discretizing a function u: [0, l] d —> R. One may think of the entries of u n as 
point samples u n (k) = u(k/n) on an equidistant grid. However, in some situations it may 
be more realistic to consider other discretization models. Area samples, for example, are 
more appropriate in many imaging applications. Anyway, most of the presented results do 
not crucially depend on the particular discretization model as long ELS Cclll be associated 
with a function u* n : [0, l] d — > R (some kind of abstract interpolation) which, in a suitable 
way, tends to u as n — > oo. 

The aim of denoising is to estimate the unknown signal u n : — (u n (k) : k G I n ) G R /n 
from the data V n := (V n (k) : k G J n ) G R 7 ™. The particular estimation procedure we will 
analyze in detail is soft-thresholding in frames and overcomplete dictionaries. We stress, 
however, that a similar analysis also applies to different thresholding methods, such as block 
thresholding techniques. 



1.1 Denoising by Soft-Thresholding 

Thresholding is by now a standard method for signal and image denoising (see, for example, 
[U [161 Ell [231 (2H1 E21 EH H3 [55] for surveys and some original references). It consists of the 
following three basic steps: 

(1) Frame Analysis: Choose a family T> n = (02 : w G 0„) of normalized analyzing 
elements in R 7 ™ and compute the coefficients V n ) of the data with respect to T> n . 
(Here Q n is some finite index set.) 

(2) Thresholding: For some threshold T G (0, oo) define the nonlinear soft-thresholding 
function 

'v + T ifv<-T 
S(-,T) : R -»■ R: u ^ S (v,T) := <v - T if v > T (1.2) 

otherwise 

V 

(which may be written in the compact form S (v, T) = sign (v) (\v\ — T),) and apply 
S(-,T) to each coefficient (</>™, V^). The resulting thresholded coefficients 5 , ((0™, 14), T) 
are considered as estimates for the signal coefficients u n)- 

(3) Frame Synthesis: Apply a reconstruction operator (usually the synthesis operator 
corresponding to the dual frame) to (5 , ((0™, V n ), T) : uj G fl n ), yielding the desired 
estimate for the signal u n . 
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Certainly, the most prominent instance of the above procedure is soft-thresholding in a 
wavelet basis [22j[23]. In this case, any step can be computed in C(|I n |) operation counts 
and hence the overall procedure of wavelet soft-thresholding is linear in the number \I n \ 
of unknown parameters (see [21 [231 US] ) ■ It is thus not only conceptually simple but also 
allows for fast numerical implementation, which other denoising techniques do not provide. 
Even simple linear spectral denoising techniques using the FFT algorithm have a numerical 
complexity of C(|/ n | log|I n |) floating point operations. Besides these practical advantages, 
wavelet thresholding also obeys theoretical optimality properties. It yields to an almost 
optimal mean square error simultaneously over a wide range of function spaces (including 
Sobolev and Besov spaces) and, at the same time, has a smoothing effect with respect to any 
of the norms in these spaces. Hence soft-thresholding automatically adapts to the unknown 
smoothness of the desired signal [22} [24] . 

1.2 Choice of the Threshold and Smoothness 

Any particular choice of the thresholding parameter T is a tradeoff between signal approxi- 
mation and noise reduction: A large threshold removes much of the noise but also removes 
parts of the signal. Hence a reasonable threshold choice should be slightly below the max- 
imum of \{4> r ^,e n )\ suc h that a significant amount of the noise is removed whereas essential 
details of the signal are kept. The smaller the actual threshold is taken, the more emphasis 
is given on signal representation and the less emphasis on noise reduction. Designing trade- 
offs between noise reduction and signal approximation requires a precise understanding of 
the statistical distribution of max £je Q n |(^, e n )\. 

In the case that T> n is an orthonormal basis, a commonly used thresholding value is the 
so called universal threshold T = cy^log |f2 n |, as proposed in the seminal work [23]. It 
satisfies the asymptotic denoising property (see [22| [35 | [34] , |4"5]) 



P < max | (4>™, e n ) | < o~\/2 log |f2 n | > =: 7r n — > 1 as n — > oo . (1-3) 

Equation (|1.3p implies, that the estimates obtained with the universal threshold are, with 
probabilities ir n tending to one, at least as smooth as u n . Here smoothness is measured, 
for example, in terms of any weighted £ p -norm (with p > 1) of the frame coefficients; see 
Section [3.51 It has been shown in |50j, that also the following limiting law holds 

max e n )| = (7-^/2 log \Q n \ + o ( y^log |fi n | J almost surely as n — > oo . (1-4) 

The combination of (jl.3p and ()1.4[) implies, that the universal threshold 0\j2 log \Q n \ is, 
with high probability, indeed slightly above the maximum of \{4>™, e n )\ and therefore, in first 
order, is the smallest sequence guaranteeing smooth reconstructions. 

Nevertheless, as has been noted frequently in the literature, the threshold U\j2 log |J7 n | is 
often too large in denoising applications |24[ [45] . As shown in Figure 11.11 the probability 
that max wg ftj((/)™, e n )\ < a \/2 log |f2 n | is quite high for finite n. Hence the universal thresh- 
old completely removes the noise with high probability; however with the side effect that 
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Figure 1.1: Probabilities 7r n denned in (jl.3p for n G {512, . . . , 5120}. We denote the sequence 
(TTn)neN as confidence trace in order to highlight that it measures the amount of smoothness 
of the reconstruction guaranteed by the thresholds a y / 21og \0, n \. 

relevant information of the signal may be lost. This is well documented in the literature 
(see, for example, [H [24"1 I45| ) and has initiated further research on refined thresholding 
procedures. In this paper we want to shed further insight to the thresholding problem and 
provide a flexible framework for threshold selection based on a variable tradeoff between 
signal approximation and noise reduction. We also provide a selective review on current 
thresholding methodology where we aim to elaborate on the link between statistical ex- 
treme value theory and thresholding techniques. To that end, we show that after linear 
rescaling the distribution of max we n n K02, £n)| converges to the Gumbel distribution and 
explicitly compute the required normalization constants. We further verify that the same 
distributional convergence result also holds for a wide class of redundant frames including 
wavelets, curvelets and unions of bases. Based on these limiting laws we introduce a class of 
extreme value thresholds which can be considered as second order correction of the universal 
threshold a^/2k^Q of [23]. 

As has been pointed out already in the seminal work [22] on soft-thresholding in wavelet 
bases, any denoising method should provide estimates that are, with high probability, at 
least as smooth as the original image. Our results in particular give an affirmative, quanti- 
tative, answer to the question which thresholding sequences yield asymptotic smoothness. 
To this end we shall introduce the notion of asymptotically stable frames in the next sub- 
section. This class is shown to include, for example, practically relevant redundant wavelet 
systems, curvelet frames and unions of bases (see Theorems 14.41 14.71 and I4.13p . However, 
we also present an important example (in the form of the translational wavelet system; see 
Theorem I4.9P which shows that highly redundant systems may show a different asymptotic 
behavior. 




4 



1.3 Main Results 



For any n G N, let T> n = (<^™ : uj G O n ) denote a frame of MJ n that consists of normalized 
frame elements (that is, ||(/>2ll = 1 holds for all u G $7 n ) and has frame bounds A n < B n 
(compare Section 12. ip . Our main results concerning thresholding estimation in the frame 
T> n will hold for asymptotically stable frames, which are defined as follows. 

Definition 1.1 (Asymptotical stability). We say that a family of frames (£> n )neN with 
normalized frame elements is asymptotically stable, if the following assertions hold true: 

(i) For some p G (0, 1), we have \{{uj,uj') G : > p}\ = o (^/==) • 

(ii) The upper frame bounds B n are uniformly bounded, that is, 
B : = sup {B n : n G N} < 00. 

In the Section U] we shall verify that discrete decimated wavelet and curvelet frames are 
indeed asymptotically stable. Asymptotical stability typically fails for frames without an 
underlying infinite dimensional frame. A prototype for such a family is the dyadic translation 
invariant wavelet transform (see Section f4. 1 .4j) . In this case, the redundancy of T> n increases 
boundlessly with increasing n, which implies that the corresponding upper frame bounds 
tend to infinity as n — > 00. 

The following Theorem ll.2l is the key to most results of this paper. It states, that after proper 
normalization the distribution of max wg nj(02> e n)\ converges to the Gumbel distribution 
- provided that the frames are asymptotically stable. Based on the limiting extreme value 
distribution we propose a class of thresholding techniques that will be shown to provide 
asymptotically sharp confidence regions and yield to smoothness estimates for general frame 
thresholding; see Section [3] for details. 

Theorem 1.2 (Limiting Gumbel law). Assume that (X\i)neN is an asymptotically stable 
family of frames, and let (e n ) ng N be a sequence of random vectors in R^ n with independent 
N(0, a 2 ) -distributed entries. Then, for every z G M., 



2z — log log I Q„ I — log 7T 



lim P < max |(C> e n)| < o^Jl log |fi n | + a " > = exp (-e 

n^oo lu>efi„ lx 2 1 /2log|fy I 



;i.5) 



The function z i— > exp ( — e z ) is known as the Gumbel distribution. 

Proof. See Section I3.ll □ 



In the case that T> n are orthonormal bases, results similar to the one of Theorem I1.2I follow 
from standard extreme value results (see, for example, [19\ I43|) and are well known in the 
wavelet community (see, for example, |22} [34l I45]). However, neither the convergence of 
the maxima including absolute values (which is the actually relevant case) nor the use of 
redundant systems are covered by these results. 
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The proof of Theorem 11.21 relies on new extreme value results for dependent chi-square 
distributed random variables (with one degree of freedom) which we establish in AppendixlAl 
In the field of statistical extreme value theory, the following definition is common. 

Definition 1.3 (Gumbel type). A sequence (M„) ngN of real valued random variables is said 
to be of Gumbel type ( or to be of extreme value typ I), if there are real valued normalizing 
sequences (a n ) ngN and (6 n ) ngN , such that the limit P {M n < a n z + b n } — > exp ( — e~ z ) as 
n — > oo holds point-wise for all z £WL. 



Using the notion just introduced, Theorem 11.21 states that max U)€ a n \((j)^ J , e n )\ is of Gumbel 
type, with normalizing sequences <ra(x, \Q n \) and o~b{x, |^n|)> where 

<X,\n n \) := 1 (1.6) 
v / 21og|Q n | 

log log IfLJ + log7T 



b( X , |fi„|) := V21og|0„| - V 'i in ■ " ' (L7) 

2^/2 log | o„ i 

As shown in Theorem 13.31 the maxima of (</>™, e n ) without taking absolute values are also of 
Gumbel type. We emphasize, however, that the corresponding normalizing sequences differ 
from those required for the maxima with absolute values. Indeed, max wg Q n | e n ) | behaves 
as the maximum of 2\VL n \ (opposed to |Sl n |) independent standard normally distributed 
random variables; compare with Remark IA.6I in the appendix. The different behavior of 
the maxima with and without absolute values does not become visible in Equations (ll.3p 
and (|1.4p . which are exactly the same for the maxima with and without absolute values. 
Only in a refined distributional limit, such as (II. 5p . the additional fluctuations due to the 
absolute values become visible. Moreover, in the case that V n are redundant, no result 
similar to Theorem 11.21 is known at all. 



1.4 Outline 

In the following Section [2] we introduce some notation used throughout this paper. In par- 
ticular, we define the soft-thresholding estimator in redundant frames. The core part of 
this paper is Section [3l where we proof the asymptotic distribution of the frame coeffi- 
cients claimed in Theorem 11.21 This result is then applied to define extreme value based 
thresholding rules and corresponding sharp confidence regions. Moreover, in this section 
we show that the resulting thresholding estimators satisfy both, oracle inequalities for the 
mean square error and smoothness estimates for a wide class of smoothness measures. Our 
proofs require new facts from statistical extreme value theory for the maxima of absolute 
values of dependent normal random variables. These results will be derived in AppendixlAl 
Whereas related theorems without taking absolute values can be found in the literature, the 
main result of this section, Theorem IA.8|, seems to be the first one of his kind. 

Finally, in Section[5]we demonstrate that many frames (including redundant wavelet frames, 
curvelet frames and unions of bases) are covered by our results. A major consequence is that 
the standard Gumbel limit occurs quite 'generically'. However, also an important counter 
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example is presented. We show that the fully translation invariant wavelet system is an 
example of a frame that fails to be asymptotically stable. It is further shown that even the 
result of Theorem 11.21 does not hold in this case and that the maximum of the translation 
invariant wavelet coefficients is strictly less (in a proper sense) than the maximum of the 
same number of independent coefficients; see Theorem 14.91 This reveals the notion of 
asymptotically stable frames to be almost necessary for the Gumbel extreme value result 
for the maxima of the frame coefficients to hold. 

2 Thresholding in Redundant Frames 

For the following recall the model (jl.ip and write e n = (e n (k) : k € I n ) G K 7 " for the noise 
vector in (jl.ip . We assume throughout that the variance a 2 of the noise is given. Fast and 
efficient methods for estimating the variance are well known (see, for example, [46]). 

Throughout this paper all estimates for the signal u n are based on thresholding the coeffi- 
cients of the given data V n with respect to prescribed frames of analyzing elements. 

2.1 Frames 

In the sequel V n := (<^>™ : uj G Q n ) C M. In denotes a frame of W In , with Q n being a finite 
index set. Hence there exist constants < A n < B n < oo, such that 

(V«„el ; ") A n \\u n \\ 2 < Y, \(^,u n )\ 2 < B n \\u n \\ 2 . (2.1) 

(Here || • || is the Euclidean norm on W In and ( • , • ) the corresponding inner product.) The 
largest and smallest numbers A n and B n , respectively, that satisfy (|2.ip are referred to as 
frame bounds. Notice that in a finite dimensional setting any family that spans the whole 
space is a frame. 

We further denote by : W In — > Ep n the operator that maps the signal u n 6 M 7 ™ to the 
analyzing coefficients with respect to the given frame, 

(Vw G fi n ) (*„u„) (w) := (C Un) . 

The mapping <l? n is named the analysis operator, its adjoint <&* the synthesis operator, and 
<f> n <f> n the frame operator corresponding to T> n . 

The frame property (|2.ip implies that the frame operator &* n Q n : 1R 7 ™ — > M. In is an invertible 
linear mapping. Hence, for any oj £ Q n , the elements 

C := (^n)- 1 C 

are well defined and the family (d>™ : uj G ft n ) is again a frame of M 7n . It is called the dual 
frame and has frame bounds 1/B n < 1/A n . 
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Finally, we denote by := (<&*<!>„) <&* the pseudoinverse of the analysis operator 
Due to linearity and the definitions of the pseudoinverse and the dual frame elements, we 
have the identities 

(Vu„ G R In ) u n = *+* n u„ = ^ u n ) ^ • (2-2) 

In particular, the mapping is the synthesis operator corresponding to the dual frame. 
Equation (|2.2p provides a simple representation of the given signal in terms of its analyzing 
coefficients. This serves as basis of thresholding estimators defined and studied in the 
following subsection. For further details on frames see, for example, [131 H5| . 

Remark 2.1 (Thresholding in a subspace). It is not essential at all, that T> n is a frame of 
the whole image space M. n . In fact, in typical thresholding applications, such as in wavelet 
denoising, the space M In naturally decomposes into a low resolution space having small 
fixed dimension and a detail space having large dimension that increases with n. The soft- 
thresholding procedure is then only applied to the signal part in the detail space and hence 
it is sufficient to assume that T> n is a frame therein. In order to avoid unessential technical 
complication we present our results for the case of frames of the whole image space. In the 
concrete applications presented in Section [7] the thresholding will indeed only be performed 
in some subspace; all results carry over to such a situation in a straightforward manner. ♦♦♦ 

2.2 Thresholding Estimation 

By applying <l? ra to both sides of the original denoising problem in the signal space 

is transferred into the denoising problem 

Y n {u) = x n (u) + (<J> n e n ) (u) , for w e fi n , (2.3) 

in the possibly higher dimensional coefficient space R^ n . Here and in the following we 
denote by 

Y n {u) := {<j%,V n ) and x n (w) := (<#>n) , (2.4) 

the coefficients of the data V n and the signal u n with respect to the given frame. The 
following elementary Lemma 12.21 states that the noise term in ()2.3[) is again a centered 
normal vector but has possibly non-vanishing covariances. Indeed it implies that the entries 
of & n e n are not uncorrelated and hence not independent, unless T> n is an orthogonal basis. 

Lemma 2.2 (Covariance matrix). Let e n be a random vector in the image space W In with 
independent N(0, a 2 ) -distributed entries. Then <& n e n is a centered normal vector in Mp n 
and the covariance matrix of & n e n has entries Cov (& n e n (uj), <& n e n (a/)) = cr 2 (<^2, ^"/)- 

Proof. As the sum of normal random variables with zero mean, the random variables 
(<& n € n ){<jj) = J2kei n ^oj W e n (k) are again normally distributed with zero mean. In partic- 
ular, we have Cov (<& n e n (u;), <fr n e n (a/)) = E (<l?„e„(u;)<l> n e n (u/)) . Hence the claim follows 
from the linearity of the expectation value and the independence of the one-dimensional 
random variables e n {k). □ 
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Recall the soft-thresholding function S (v,T) = sign (v) (\v \ — T) , defined by Equation (jl.2p . 
The thresholding estimators we consider apply S(-,T) to each coefficient of Y n in (|2.3p to 
define an estimator for the parameter x n . In order to get an estimate for the signal u n 
one must map the coefficient estimate back to the original signal domain. This is usually 
implemented by applying the dual synthesis operator (compare with Equation (|2.2p ). 

Definition 2.3 (Frame thresholding). Consider the data models U.l\) and h2.3\) and let 

T > be a given thresholding parameter. 

(a) The soft-thresholding estimator for the parameter x n G Mp n using the threshold T is 
defined by 

x n = S (Y n ,T) := (S(Y n (u),T) : to G ft n ) G R n " . (2.5) 

(b) The soft-thresholding estimator for the signal u n with respect to the frame T> n using 
the threshold T is defined by 

n n = *+oS(* n y n ,T)= S({<%,V n ),T)fc. (2.6) 

Hence the frame soft-thresholding estimator u n is simply the composition of analysis with 
3> n , component-wise thresholding, and dual synthesis with <fr+. 

If T> n is an overcomplete frame, then $ n has infinitely many left-inverses, and the pseudoin- 
verse used in Definition 12.31 is a particular one. In principle one could use other left inverses 
for defining the soft thresholding estimator (I2.6p . Since, in general, S (Y n ,T) Ran (& n ) is 
outside the range of 3> n , the use of a different left inverse will result in a different estimator. 
The special choice 3?^ has the advantage that for many frames used in practical appli- 
cations, the dual synthesis operator is known explicitly and, more importantly, that fast 
algorithms are available for its computation (typically algorithms using only 0(|/ n | log|/ n |) 
or even C(|/ n |) floating point operations [45]). 

Remark 2.4 (Thresholding variations). Instead of the soft thresholding function S(-,T) 
several other nonlinear thresholding methods have been proposed and used. Prominent 
examples are the hard thresholding function z \-± zx{\ z \>t} an d the nonnegative garrote 
z i — y zmaxjl — T 2 /z 2 ,0} of JH Strictly taken, the smoothness estimates derived in 

Section \3.5\ only hold for thresholding functions F(v,T) satisfying the shrinkage property 
\F[v ± T, T)| < \v\ for all This property is, for example, not satisfied by the non- 

negative garrote. In this case, however, similar estimates may be derived under additional 
assumptions on the signals of interest. Other prominent denoising techniques are based on 
block-thresholding (see, for example, |21 [7| \31)j). In this case, the derivation of sharp 
smoothness estimates requires extreme value results for dependent x 2 -distributed random 
variables (with more than one degree of freedom). Such an extreme value analysis seems 
possible but is beyond the scope of this paper. Our given results can be seen as the first step 
towards a more general theory. ♦♦♦ 

Remark 2.5 (Multiple selections). Be aware, that we allow certain elements (ft™ to be 
contained more than once in the frame T> n . Hence we may have \{(ft™ : cu G Q n }\ < |^n|- Such 



9 



multiple selections often arises naturally for frames that are the union of several bases having 
some elements in common. A standard example is the wavelet cycle spinning procedure 
of fThy . where the underlying frame is the union of several shifted orthonormal wavelet bases 
(see Section \4-1.3 ). Multiple selections of frame elements also affect the pseudoinverse and 
finally the soft-thresholding estimator. Hence, if ((/>" : ui G Q n ) an d (ip^ ■ A G A n ) denote 
two frames composed by the same frame elements, : u G Sl n } = {ip™ : A G A n }, but 
having different cardinalities \Q n \ / |A n |, then the soft-thresholding estimators corresponding 
to these frames differ from each other. ♦♦♦ 



2.3 Rationale Behind Thresholding Estimation 

We conclude this section by commenting on the underlying rationale behind thresholding 
estimation and situations where it is expected to produce good results. 

The basic assumption underlying thresholding estimation is that the frame operator sepa- 
rates the data into large coefficients due to the signal and small coefficients mainly due to 
the noise. For additive noise models V n = u n + e n both issues can be studied separately. In 
this case, one requires that for some threshold T (which finally judges between signal and 
noise) the following two conditions are satisfied: 

(1) Coherence between signal and frame: The signal u n is well represented by few 
large coefficients having \((j)™,u n )\ > T. 

(2) Incoherence between noise and frame: With high probability, all noise coeffi- 
cients with respect to the frame V n satisfy |(<?^,e n )| — ^ ■ 



In the following sections we shall see, that Item (2) can be analyzed in a unified way for 



asymptotically stable frames. Item (1) , however, is more an approximation issue rather than 
an estimation issue. Given a frame, it, of course, cannot be satisfied for every u n G W 1 . The 
choice of a 'good frame' depends on the specific application at hand and in particular on 
the type of signals that are expected. The better the signals of interested are represented 
by a few but large frame coefficients, the better the denoising result will be. The richer the 
analyzing family is, the more signals can be expected to be recovered properly. The price 
to pay must be, of course, a higher computational cost. 

The following two simple examples demonstrate how the use of redundant frames may 
significantly improve the performance of the thresholding estimation. 

Example 2.6 (Thresholding in the sine basis). We consider the discrete signal u n G 1" 
defined by u n (k) = 5\/2/16 sin {/Kuj\k/n\ + 5\/2/16 sin (Truj^k/n) , which is a superposition 
of two sine waves having frequencies uj\ = 150 and u>2 = 380, respectively, and amplitudes 
5\/2/16 ~ 0.45. The left image in Figure \2.1\ shows the signal u n and the noisy data 
V n = u n + e n obtained by adding Gaussian white noise of variance equal to one to the 
signal. Apparently, there seems little hope to recover u n from the data V n in the original 
signal domain. Almost like a miracle, the situation changes drastically after computing 
the coefficients with respect to the sine basis (n~ l l 2 sm(irujk/n) : uj = l,...,n). Now, the 
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Figure 2.1: Left: Signal u n (superposition of two sine waves) and data V n = u n + e n from 
Example 12.61 Right: Coefficients of the signal and the data with respect to the sine basis. 



signal and the noise are clearly separated as can be seen from the right image in Figure [Ql 
Obviously we will get an almost perfect reconstruction by simply removing all coefficients 
below a proper threshold. ♦♦♦ 




Figure 2.2: Left: Coefficients of the signal u' n and the data V^ = u' n + e n from Example 12.71 
with respect the sine basis. Right: Coefficients of the same signal and data with respect to 
the two times oversampled sine frame. 

Example 2.7 (Thresholding in a redundant sine frame). The signal in Example \2.6\ is a 
combination of sine waves with integer frequencies covered by the sine frame. However, in 
practical application the signal may also have non-integer frequencies. In order to investigate 
this issue, we now consider the signal u' n (k) = by/2/16 sin (nuj^k/n) +5\/2/16 sin (jr^k/nj 
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Figure 2.3: Left: Signal u' n , and the reconstructions from the data = u' n + e n by soft- 
thresholding in the sine basis and in the overcomplete sine frame, respectively. Only the 
first 64 components are plotted. Right: Sine wave 5V2/16 sin yKUik/n) and residuals of 
the two reconstructions. As can be seen, thresholding in the sine frame almost perfectly 
recovers the signal u n , whereas the result of thresholding in the sine basis is useless (the 
residual is almost equal to the displayed sine wave of frequency u^.) 

having frequencies u[ = 150.5 and U2 = 380 (hence oj^ is a slight perturbation of the fre- 
quency uj\ considered in Example \2. 6\) . The new signal u' n is not a sparse linear combination 
of elements of the sine basis. As a matter of fact, the energy of the first sine wave is spread 
over many coefficients and thus submerges in the noise. And indeed, as can be seen from the 
left image in FigurelKM the low frequency coefficient disappears. However, by taking the two 
times redundant frame (n -1 / 2 sin (iruk/n) : uj = {1/2, 1, . . . , n}) instead of the sine basis, 
the coefficient due to frequency u}[ appears again in the transformed domain. Moreover, as 
can be seen from Figure \2.3\ the reconstruction by thresholding the coefficients with respect 
to the overcomplete sine frame is almost perfect, whereas the reconstruction by thresholding 
the basis coefficients is useless. ♦♦♦ 

In Examples 12.61 and 12.71 the threshold choice is not a very delicate issue since the signal 
and the noise are separated very clearly in the transformed domain. Indeed as can be seen 
from the right plots in Figures 12.11 and 12.21 there is a quite wide range of thresholds that 
would yield an almost noise free estimate close to the original signal. However, if the signal 
also contains important coefficients of moderate size, then the choice of a good threshold 
is crucial and difficult. This is typically the case for image denoising using wavelets or 
curvelet frames: Natural images are approximately sparse in these frames but almost never 
strictly sparse. The particular threshold choice now will always be a tradeoff between noise 
removal and signal representation and becomes a delicate issue. In order to develop rationale 
threshold choices, a precise understanding of the distribution of \{4>Zi u n)\ is important. This 
is the subject of our following considerations. 
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3 Extreme Value Analysis of Frame Thresholding 



Now we turn back to the denoising problem (jl.lj) . After application of the analysis operator 
3> n corresponding to the normalized frame T> = : u) G f2 n ) our aim is to estimate the 
vector x n G from given noisy coefficients (compare with Equation ()2.3p ) 

ui) , for oj G £l n . 

Here 3> n e n is the transformed noise vector which is normally distributed, has zero mean and 
covariance matrix K n (u,u') = o~ 2 ((j)™,(f)™,}; see Lemma 12, 21 In this section we shall analyze 
in detail the component- wise soft-thresholding estimator x n = S(Y n ,T) defined by (|2,5I) . 
We will start by computing the extreme value distribution of 3? ra e ra claimed in Theorem II .21 
Based on the limiting law will then introduce extreme value thresholding techniques that 
will be shown to provide asymptotically sharp confidence regions. 



3.1 Proof of Theorem CC2 

The main aim of this subsection is to verify Theorem 11.21 which states that the distribution 
of the maxima of the noise coefficients Q n e n {uj) are of Gumbel type with explicitly given 
normalization constants. The proof of Theorem 11.21 will be a consequence of Lemmas 13.11 
and 13.21 to be derived in the following. The main Lemma 13. II relies itself on a new extreme 
value result established in Appendix lAl 

Lemma 3.1. Let (Cn) ngN be a sequence of normal random vectors in Mp n with covariance 
matrices K n having ones in the diagonal. Assume additionally, that the following holds: 

(i) For every 5 G (0, 1) we have |{(cj,u/) G £l 2 n : \K n {u,u')\ > S}\ = O (|O n |). 

(ii) For some p G (0, 1) we have | { (co, oo') G : \K n (co, uj')\ > p} \ = o ^|Sl n | / A/log . 

(Hi) We have B := sup {Ylui'en n co')\ 2 : n G N and uj G J7 n } < oo. 

Then, H^nlloo is of Gumbel type (see Definition \l.'J\) with normalization constants a(x, |^n|) 
and b(x, \&n\) defined by il.6\) and \1. 7) . 



Proof. Let (£ n ) ngN be a sequence of normal random vectors satisfying Conditions (f 



in 



According to Theorem IA.8I in the Appendix, in order to establish the claimed result, it is 
sufficient to show that 

/log |0„| \ 1 /( 1 +\ l ^n(^')\) 

R n := |k„(u;,u/)| I ^- ] —^0 as n — > oo . 

This will be done by splitting the sum R n into three parts and showing that each of them 
tends to zero as n — > oo. For that purpose, let 5 G (0,1/3) be any small number, let 
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p G (0, 1) be as in Condition I (ii) I and define 



A n (1) 
A„(2) 
An (3) 



{(cu,co')en 2 n 

{(u;,u/)g0 2 



uj ^ uj' and |K n (cJ, > p} 
5 < |«n(w, < p] , 

\K n (u3,U}')\ < 5} . 



We further write R n = R n (1) + R n (2) + R n (3) with 



R n (i) = |k„(w,w')| 

(w,w')eAn(<) 



log Kjnj 
10 I 2 



for t G {1,2,3} 



It remains to verify that any of the terms R n (i) converges to zero asru-oo. 

• Since any £ n is a normal random vector with zero mean and unit variance, we have 
\K n (oj,U)')\ < 1 for any index pair (co,uj r ) G O 2 , which yields the inequality R n (1) < 



|A n (1)| y / log]O n ~[/|O n |. By Condition [(ii)] we have |A n (l)| = o (jO n | / 
which shows that R n (1) — > as n — > oo. 

To estimate the second sum R n (2), recall that by definition of the set A n (2), we 
have \K n (uj,uj')\ < p for any pair of indices G A n (2). Moreover, recall that by 

Condition (i) we further have |A n (2)| = O (|O n |). Hence we obtain 

'log|^ 1/(1 ^ 



R n {2) < A n (2) 



I O n I 



(log|O n |) 1 /^o(|O n | 1 - 2 /( 1 ^) . 



Since by assumption p < 1, the inequality 1 — 2/(1 + p)<0 holds which implies that 
we have R n (2) — > as n — > oo. 

It remains to estimate the final sum R n (3). The Cauchy-Schwarz inequality, Condi- 
tion 



(iii) and the estimate \K n (oj,uj')\ < 5 yield 



-Rn(3) 2 < ^2 \k u (u;,uj')\ 2 

(w,£j')eA„(3) (w,u/)eA„(3) 



l0g|0nh 2/(1+5) 



i a 



< B |O n | 



fog I Or. 

10 I 2 



2/(1+5) 



O n | 2 = (log|0„|) 2/(1+5) o(|O r . 



,3-4/(1+5) 



Now, by assumption the inequality 5 < 1/3 holds and hence we have 4/(1 + 5) > 3. 
This implies that also R n (3) tends to zero as n — > oo. 

In summary, we have verified that R n (i) — > as n — > oo for every i G {1, 2, 3}. Hence their 
sum R n converges to zero, too. The claimed distributional convergence results now follows 
from Theorem IA.8I in the Appendix and concludes the proof. 



□ 



We next state a simple auxiliary Lemma that bounds the number of inner products 
being bounded away from zero. 
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Lemma 3.2. For any n let (6™ : oj G fi n ) be a family of normalized vectors in W n , such 
that the upper frame bounds B n are uniformly bounded. Then, for every 5 > 0, we have 

\{Mei%:\(<%,4fr)\>8}\=0(\fl n \) . (3.1) 



Proof. To verify (|3.1|) it is sufficient to find, for every given 5 > 0, some constant K G N 
such that 

(VneN)(V^Gft n ) \{uj' en n :\(<j>Z, 6}\<K . (3.2) 

Indeed, if (|3.2p holds then summing over all uj G Vl n yields (|3.ip . 

To show (|3.2p we assume to the contrary that there is some 5 > such that for all m G N 
there exists some n (to) G N and some w G ^ n ( m ) such that the set A m = {a/ G Q n ( m ) ■ 

m {m) AT ] )\>5} contains more then to elements. By Assumption we have the equality 
1 1 ^n(m) || _ j a ^ ^ g ^ Together with Assumption | (ii) | this implies 

^ = i?||c (m) || 2 > E Kc (m) ,C' m) )! 2 > E |<c (m) ,C' m) >! 2 >^- 

Since the last estimate should hold for all to G N and we have B < oo by assumption, this 
obviously gives is a contradiction. □ 

Proof of Theorem 11.21 Theorem is now an immediate consequence of the above re- 
sults: Lemma \2.& and Lemma \3.%\ show that the sequence of normalized frame coefficients 



(<& n e n /cr) ng N satisfies Conditions (i)-(iii) of Lemma \3.l\ Hence Lemma \3.1\ applied to the 



random vectors £ n = <& n e n /a shows that 

,. -of KC,en>| , fT-, iT^-r , 2z-loglog|O n | -logvr) , . 

hm P < max < v21og \U n \ H ; > = exp — e , 

n^oo \„en n a " v Sl n| 2 v /21og|fi„| J V ; 

pointwise for any z G M. After bringing the standard deviation a to the right hand side in 
the inner inequality yields the limiting law claimed in Theorem \1.2L 

We conclude this subsection by stating an extreme value result for the maximum without 
the absolute values. Although we do not need this result further, the distributional limit is 
interesting in its own. 

Theorem 3.3 (Limiting Gumbel law without absolute values). Assume that the frames 
T> n are asymptotically stable and let (e n ) n( =N be a sequence of random vectors in hav- 
ing independent N (0,1) -distributed entries. Then, the random sequence of the maxima 
max($ n e n ) := ih£lx-[^(^^, 6 n ) : uj G of GiLTTib&l type with TioTTfiQ>lizQ>tioTi constants 

a(N,\n n \) := 1 (3.3) 
V21og \il n \ 

h(AT\C>\\ fo\ 1777 log log I I +1 °g ( 4?r ) to A \ 

b(N, \il n \) := a/2 log |O re | . (3.4) 

2 v /21og |O n | 

Proof. The proof is analogous to the proof of Theorem 11.21 and uses the extreme value result 
of Theorem IA.4I for dependent normal random vectors instead of the one of Theorem IA.8I 
for absolute values of dependent normal random vectors. □ 
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3.2 Universal Threshold: Qualitative Denoising Property 



In the case that the family T> n = (6™ : u 6 f2 n ) is an orthonormal basis it is well known that 
the thresholding sequence T n = o" \/2 log 1 0, n | satisfies the asymptotic denoising property 
(see, for example, \22 \ 134 4 |4"5] and also Subsection 11.21 in the introduction), that is, 

lim P{||* n en|U <T n } = l. (3.5) 

n— yoo 

Equation (|3.5p implies that the estimates obtained with the threshold o"-\/2 log \ £l n \ are, with 
probabilities tending to one, at least as smooth as u n . Hence the relation (|3.5p is often used 
as theoretical justification for using the universal threshold choice cr\/2 log |O n | originally 
proposed by Donoho and Johnstone (see [22], [23]). The following Proposition 13.41 states 
that the same denoising property indeed holds true for any normalized frame. Actually it 
proves much more: First, we verify (|3.5p for a wide class of thresholds including the Donoho- 
Johnstone threshold. Second, we show that this class in fact includes all thresholds that 
satisfy the denoising property (|3.5p - provided that the frames are asymptotically stable. 
Our results can be seen as a generalization and a refinement of |22[ Theorem 4.1] from the 
basis case to the possibly redundant frame case. 

Proposition 3.4 (Qualitative denoising property). Assume that D n are frames of M. In 
having normalized frame elements and analysis operators <& n; and let (e n ) n( =N be a> sequence 
of noise vectors in M In with independent N(0, a 2 ) -distributed entries. 



(a) 7/(2? n ) ng N is asymptotically stable, then a sequence (T n ) neS of thresholds satisfies A3. 5\) 
if and only if it has the form 

q n - log log \Q 



T n := <7 a/2 log \Q n \ + a — — with lim q n = oo . (3.6) 



2^2^ | n n 



(b) If (^n)neN i- s n °t necessarily asymptotically stable, then still any sequence \T n ) ^ C 
(0, oo) of the form \3. 6\) satisfies the asymptotic denoising property \3. 5\) . 



Proof, (a) Theorem 11.21 immediately implies that a sequence (T n ) nG ^ satisfies (13. 5p if and 
only if it has the form 

t Avi irTl , 2z„ -loglog|O n | -logvr 

2^2 log \tt n \ 

for some sequence (z n ) n ^ with z n — > oo. Hence the claim follows by taking q n := 2z n — log ir. 

[(b)] Now let V n be any sequence of frames that is not necessarily asymptotically stable. 
Further, let rj n be a sequence of random vectors with independent iV(0, <r 2 )-distributed 
entries. S inc6 *& n c n is el random vector with possibly dependent iV(0, (j^)-distributed entries, 
Lemma I A. 9 1 implies that 



By Item (a) we already know that P{||^ n ||oo < T n } — > 1 as n — > oo, for any sequence of 



thresholds satisfying (I3.6p . and hence the same must hold true for P{||* n en||oo < T n }. □ 
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Proposition 13.41 states that any sequence (<7n) ngN with liuin^^ q n = oo defines a sequence of 
thresholds via (|3.6p that satisfies the asymptotic denoising property. In particular, by taking 
q n = log log | | the thresholds in (|3.6p reduce to the universal threshold a^j2 log |fi n | of 
Donoho and Johnstone. The property f|3.5j) is often quoted as the statistical reason for the 
threshold o"y / 2 log |fi n |. However, except of looking most simple, this threshold plays not 
special role at all among all thresholds satisfying f|3.6|) . 

Remark 3.5 (Use of smaller threshold). As mentioned above the threshold a^j2 log |f2 n | 
is typically too large in many imaging applications. One could use 13. 6\) to design smaller 
threshold sequences based on the following reasoning. First, Proposition \3.4\ shows that 
the asymptotic relation T n ~ crW2 log \ ft n \ alone is not sufficient for the denoising prop- 
erty jS 5\) to hold and that second order approximations have to be considered. Second, taking 
q n = 7 log log |O n | for some 7 G (0, 1) in Equation \3. 6\) indeed yields a smaller thresholding 
sequence T n , in the sense that the second order term {o~y/2 log |f2 n | — T n ) \J2 log |f2 n | tends to 
00, that still satisfies \3. 5\) . It is now tempting to conclude that T n is some kind of improve- 
ment of the Donoho- Johnstone threshold since it is indeed smaller (in a reasonable sense) 
and still satisfies h3. 5|) . However, such a conclusion is quite foolish. For any 'optimized' 
threshold T n one can always find a 'more optimized' threshold T' n with (T n — T^j T n — > 00 
that still satisfies the denoising property 113. 5\) . However, the smaller the threshold T n is 
taken, the slower the convergence of P{ ||3? n e n||oo < T n \ will be. Hence a smaller thresh- 
old satisfying the asymptotic denoising property is not necessarily a better one, but just a 
different compromise between noise reduction and signal approximation. ❖ 

The confusion caused by the arguments in Remark 13.51 has its origin in the qualitative 
nature of (|3.5p . Hence, it is more reasonable to design thresholding rules that control 
P {ll^nCnlloo < Tn} hi a quantitative manner. Such rules are discussed in the following and 
again will follow from the extreme value distribution computed in Theorem 11.21 

3.3 Extreme Value Threshold: Sharp Confidence Regions 

For the following notice that the soft-thresholding estimate x n = S (Y n ,T n ) with threshold- 
ing parameter T n is an element of the || • ||oo-ball 

R(Y n , T n ) := {x n e R Q " : \\x n - < T n ) (3.7) 

around the given data Y n . Our aim is to select the thresholding value T n in such a way, that 
R(Y n ,Tnj is an asymptotically sharp confidence region corresponding to some prescribed 
significance level a, in the sense that the probability that we have x n 6 R(Y n ,T n ^ tends to 
1 — a as n — > 00. By definition, we have x n S R(Y n ,T n ) if and only if \\x n — ln||oo — T n . 
The data model Y n = x n + <& n e n thus implies that 

P{x n €i2(y n ,T n );Vx n €Ran(* n )}=P{||* n e n || 0O <r n } . (3.8) 

Here and in similar situations, a notion like P {x n £ R(Y n ,T n ) ;\/x n G Ran(<J> n )} indicates 
the probability of the intersection of all the events {x n G R{Y n ,T n ^ taken over all x n G 
Ran(* n ). 



17 



Now assume that the frames are asymptotically stable. Then Theorem 11.21 states that the 
probabilities in Equation f|3.8|) with T n = cra[x, \Q n \)z + c&(x> |^n|) tend to the Gumbel 
distribution exp (— exp (— z)). This suggests the following threshold choice based on the 
quantiles of the limiting Gumbel distribution. 

Definition 3.6 (Extreme value threshold). Let (a n ) n gN £ (0, 1) be any sequence of sig- 
nificance levels, denote by z{a n ) = — loglog(l/(l — a n )) the a n -quantile of the Gumbel 
distribution, and set 

rp / m h nr. ttti . 2z(a n ) - loglog |O n | - logvr 

T(a n ,\U n \) := oy21og|S2 rt | + a . (3.9) 

2 a/2 log |^| 

We then name T[a n , \£l n \) the sequence of extreme value threshold (EVT) corresponding to 
the significance levels a n . 

The following Theorem 13.71 states that the EVTs defined by Equation (|3.9p indeed define 
asymptotically sharp confidence regions. Actually it is mere a corollary of the extreme value 
result derived in Theorem II. 2 [ 

Theorem 3.7 (Confidence regions). Let (T> n ) n ^ be a asymptotically stable family of frames 
in and let (a n ) n ^ be a sequence of numbers in (0,1) converging to some a S [0,1). 
Then, with the extreme value thresholds T(a n , defined in Equation i3.9\) . we have 

lim P{x n eR(Y n ,T(a n ,\n n \))-,Vx n eR&n{$ n )} = l-a. (3.10) 

Hence, the sets R(Y n , T(a n , \0, n \) ) defined in ( fg.7| ) are asymptotically sharp confidence re- 
gions with significance level a. 

Proof. According to (|3.8|) it is sufficient to show that P{ ||^ n en||oo < T(a n , 1 0„ | ) } — > 1 — a 
as n — > oo. Theorem 11.21 and the definition of the thresholds in (|3.9|) imply that the 
probability of the event {||3? n e n||oo < T(a n , |0„|)} converges to exp ( — exp(— z(a))) as 
n — y oo. Since the quantile z(a) is defined as the solution of exp (— exp {—z)) = 1 — a this 
yields Equation (IBTTOj) . □ 

Corollary 3.8. Let (P n )neN be any family of frames (not necessarily asymptotic stable) 
having normalized elements, and consider the data model Y n = x n + Q n e n with noise vectors 
e n having possibly dependent N [0, <r 2 ) -distributed entries. Then, it still holds that 

liminf P {x n e R(Y n ,T(a n , \Q n \)) ; Vx„ G Ran(* n )} > 1 - a. (3.11) 

n— too 

Proof. This follows from Theorem 13.71 and Lemma IA.91 □ 

Notice, that in Corollary 13.81 the sets R(Y n ,T(a n , are not necessarily asymptotically 

sharp confidence regions, in the sense that inequality (13, llh may be strict. Actually, we 
believe that asymptotical stability of the frames T> n is close to being necessary for the sets 
R(Y n ,T '(a n , |^n|)) defining asymptotically sharp confidence regions. Actually, for specific 
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highly redundant dictionaries where asymptotic stability fails to hold (such as the translation 
invariant wavelet frame; see Section [4,1.4p we expect that P{ ||3?n e n||oo < ca n z + cr6 n } still 
converges to the Gumbel distribution - however with normalization sequences a n and b n 
being strictly smaller (in a reasonable sense; compare with Remark I3.5P than <ra(x, |^n|) 
and <t6(x, |^n|)- If this is the case, then the smaller thresholds T n — o~a n z(ct n ) -\- o~b n again 
define sharp confidence regions. Surprisingly, results on the distributional convergence of 
H^nCnlloo or even of max (3> n e n ) for redundant frames are almost nonexistent. 

Remark 3.9 (Maxima for normalized sums). The only closely related results that we are 
aware of are for normalized sums of random vectors over certain intervals in one dimension 
and rectangles in higher dimensions (see f3b\ \37[ [33[ \5% |53)/). Exemplarily, we mention a 
result derived in f3h\[5^ . Let Q n = {{k, k + 1, ...,£}: k < £ E {0, ... , n— 1}} denote the set 
of all discrete intervals in {0, ... , re — 1}, let 4>q denote the \\ ■ H2 -normalized characteristic 
function of the interval Q E Q n and let (e n ) nG N be a sequence of normal vectors having 
independent N (0,1) -distributed entries. Then, the rescaled maxima ( max{(</>q, e n ) : Q £ 
Qn} — bn) / a n converge in distribution to the Gumbel distribution for certain explicitly known 
sequences a n and b n being strictly smaller than a(xJQn|) and b{xi\Qn\)- Almost sure 
convergence results for the maxma over rectangles have been obtained in f38[ EE *♦* 

3.4 Rate of Approximation 

Strictly taken, Theorem 13.71 only claims that the || • Hoc-balls R(Y n ,T(a n ,\Q n \)) turn into 
confidence regions in the limit n — > 00, but it does not directly give any result for finite n. 
Sometimes it is argued that, even in the independent case without taking absolute values, 
the rate of convergence of P{max($ n e n ) < T} to the Gumbel distribution is known to be 
rather slow (see, for example, [431 Section 2.4]). Another option could be to derive non- 
asymptotic coverage probabilities along the lines of [40] . however at the price of typically 
quite conservative confidence bands. 

Nevertheless, numerical simulations clearly demonstrate, that even for moderate re, the 
approximation of PjH^nenHoo < aa(x, \&n\)z + o~b(x, l^n|)} with the limiting Gumbel dis- 
tribution is quite good. This even holds true for redundant frames as can be seen from 
Figure 13. 1^ where the distribution functions of the rescaled maxima of the coefficients with 
respect to the two times oversampled sine frame of Example 12.71 are compared with the 
limiting Gumbel distribution. The top line in Figure 13.11 displays the normalized empirical 
distributions of ||<&nen||oo for signal lengths of re = 128, n = 512 and n = 1024 (computed 
from 10000 realizations in each case) and the limiting Gumbel distribution. As can be seen, 
there is no noticeable difference between those functions. The bottom line in Figure 13.11 
shows a Q-Q-plot (quantile against quantile) of those distributions and again indicates that 
the quantiles of the rescaled maximum for finite re are more than sufficiently well approxi- 
mated by the ones of the limiting Gumbel distribution. A theoretical analysis of this matter 
of fact may be subject of future research. 
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3.5 Smoothness Estimates 

We have just seen that the coefficient vector x n is contained in the confidence regions 
R(Y n ,T(a n , \£l n \)) around the data Y n with probability tending to 1 — a. Moreover, by 
definition, the soft-thresholding estimate x n = S (Y n ,T(a n , \ £l n \)) is contained in the region 
R(Y n ,T(a n , |Q„|)) , too. The following theorem shows that the soft thresholding estimate is 
indeed the smoothest element in this confidence region, with smoothness measured in terms 
of a wide class of functionals. 

Theorem 3.10 (Smoothness estimates). Let (j7n) n6N be a family of functionals J n \ Mp n — > 
M U {oo} having the property that 

Jn{x n ) < Jn{xn) whenever \x n {uj)\ < \x n (oj)\ for all oo G £l n ■ (3-12) 

Moreover, consider the data model Y n = x n + Q n £n, where (e n ) ne N is a sequence of random 
vectors with N(0, a 2 ) -distributed entries, let (a n ) ng N be a sequence in (0,1) converging to 
some a G [0,1), and denote x n := S(Y n ,T(a n ,\Q n \)) . Then, 

liminiP {jJx n ) <JJx n );Vx n eRan(* n )} > I- a. (3.13) 

Hence, the soft thresholding estimate x n is at least as smooth as the original parameter x n , 
with probability tending to 1 — a as n — )• oo, where smoothness is measured in terms of any 
family of functionals J n satisfying $3.12\) . 

Proof. The definition of the soft-thresholding nonlinearity immediately implies that x n is 
an element of the confidence region R(Y n , T(a n , \ Q n \)) and that for every other element x n 
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contained in this confidence region we have < |x n (w)| for all u> G Q n . By Corol- 

lary [3?8] the true parameter x n is contained in R(Y n , T(a n , |f2 re |)), too, with a probability 
tending to 1 — a. We conclude that 

liminf P{|x„(w)| < \x n (uj)\ ;oj G £l n \\/x n G Ran(* n )} > 1 -a. (3.14) 

n— >oo 

The assumption (I3.12D on component- wise monotonicity of the functionals J n now im- 
plies that the event {|x n (w)| < |:r n (a;)| ; Vu; G f2 n ;Vx n G Ran($ n )} is contained in the 
event {J n {xn) < Jn^n) ;Vi n G Ran($ n )}. Together with (I3.14p this inclusion property 
yields ([Til?]) . □ 

Remark 3.11 (Shrinkage property). The proof of Theorem \3.1(A uses two main ingredi- 
ents: First, soft-thresholding selects that element in R(Y n ,T(a n ,\Q n \)^ which has mini- 
mal component- wise magnitudes and second, the true coefficient x n is contained in the set 
R(Y n ,T(a n , \Q n \)) with probability tending to I — a. The former property is often referred to 
as the shrinkage property of soft-thresholding and has already been used in ]2%tf for deriving 
smoothness estimates for orthogonal wavelet soft-thresholding in the spirit of 113.13]) . The 
second property relies on our extreme value result derived in Theorem \1.2[ Notice, however, 
that the weaker result P{ l 7 n (x n ) < J n (x n )} — >■ using the threshold o~\j2 log \0, n \ is already 
implied by known results from the orthonormal basis case combined with Sidak's Lemma \A.S{ 
Such a reasoning has indeed been used in \35^ for motivating the threshold o~\j2 log |f2 n | in 
orthogonal wavelet denoising using data with correlated noise. ♦♦♦ 

Remark 3.12 (Strongly redundant case). The proof of Theorem \3.10\ shows that for asymp- 
totically stable frames the considered thresholds T(a n , |O n |) are close to being the smallest 
ones yielding smoothness estimates of the form Ii3.13\) . For strongly redundant frames, how- 
ever, where asymptotic stability fails to hold, smaller thresholds yielding the same smoothness 
bounds can exist. In Theorem we show that this is indeed the case for the dyadic discrete 
translation invariant wavelet system. ♦♦♦ 

Basic but important examples for functionals satisfying the component- wise monotonicity 
property (|3.12p are powers of weighted £ 2 -norms, 

for some c(u) > . 




In the case of wavelet and Fourier frames, these norms of the coefficients provide norm equiv- 
alents to Sobolev norms in the original signal domain (assuming an appropriate discretiza- 
tion model u i— > u n ). Sobolev norms are definitely the most basic smoothness measures of 
functions. More general and also practically relevant classes of smoothness measures are 
Besov norms. Assume for the moment that T> n is a wavelet frame where the index set has 
the multiresolution form Q n = {(A, k) : A G A n and k G -Da} for some index sets A n and -Da 
corresponding to scale/resolution and scale dependent location, respectively. In this case 
one takes the functional J n as one of the norms 




c (A) ||x n (A, •) || 9 for some c(A)>0. 
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These norms again satisfy the monotonicity property (|3. 12[) and moreover yield to norm 
equivalents of Besov norms for properly chosen weights c (A); see Section [4.1 i Such weighted 
(p, g)-norms are also reasonable in combination with other multiresolution systems, such as 
the curvelet frame (see Section 14.21) . 



3.6 Risk Estimates 

Although the main focus in this work is on confidence regions and smoothness estimates, 
in the following Proposition 13.131 we shall verify that using the EVTs of Definition 13.61 
yields risk estimates similar to the oracle inequalities of |23j. The following result is non- 
standard regarding two aspects: First, it allows arbitrary frames instead of orthonormal 
bases. Second, and more importantly, it considers our more general class of extreme value 
thresholds instead of the universal threshold log \ £l n \. 

Proposition 3.13 (Oracle inequality). LetT> n be a frame inM> In with corresponding anal- 
ysis operator $ n . Moreover, let u n = o S ($> n (V n ),T) denote the soft thresholding 
estimator in \2. 6]) corresponding to the extreme value thresholds T = T(a n , \Q n \) defined by 
Equation $3. 9\) . and assume for simplicity that T(a n , |O n |) < cry/ 2 log |f2 n | . Then, we have 

'2 



E (\\u n - u n \\ 2 ^ < ^- ( log (1/(1 -a n )) x/itXog]^ 



(l + 21og|O n |) min jl, 

,.,czc> y 



a 2 



. (3.15) 



Here A n is the lower frame bound ofV^; see Equation \2. 

Proof. Appendix lB.il □ 



4 Examples from Signal and Image Denoising 

In this section we verify that many important frames used for thresholding in signal and 
image processing are asymptotically stable and thus covered by the results of the previous 
section. These examples include redundant wavelet systems and curvelet frames. We also 
consider an important example, where our basic asymptotic stability fails to hold; namely 
the discrete translation invariant wavelet frame. Actually, we show that not even the result 
of Theorem 11.21 (and thus all of its implications) holds in this case. This indicates that the 
stated conditions are indeed close to being necessary for the asymptotical distributional law 
of Theorem 11.21 Further, we derive confidence regions and smoothness estimates for the 
translation invariant wavelet transform that significantly improve over simple application of 
Proposition \3A\ Item |(b)] (and also the main result of [3]). 
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4.1 Redundant and Non- Redundant Wavelet Denoising 

In the following we consider one dimensional wavelet denoising. The generalization to higher 
dimensional wavelet denoising is straightforward. We shall discuss thresholding in biorthog- 
onal wavelet bases, certain overcomplete wavelet frames (using the so called cycle spinning 
procedure), and fully translation invariant wavelet systems. Before considering those partic- 
ular examples, we collect some notation and present basic facts about biorthogonal wavelets 
(which include the orthogonal ones) that we need for the application of our general results. 

4.1.1 Biorthogonal Wavelet Bases 

One dimensional wavelets are generated by dilating and translating a single function, the so 
called mother wavelet. The distinguished feature of wavelet systems is that various classical 
smoothness measures (Triebel, Sobolev and Besov norms) can be characterized by simple 
norms in the wavelet domain. In the following, for the sake of simplicity, we only consider real 
valued periodic wavelets on the interval [0, 1]. Moreover, we restrict ourselves to compactly 
supported biorthogonal wavelets that arise from a multiresolution decomposition. 

Denote by Q the set of all index pairs of the form (j, k) with j G N and k G {0, . . . , 2 J ' — 1}. 
The index j is refereed to as resolution or scale index and k to as the discrete location 
index. Moreover, let (f),ip G I? (M) denote the father and mother wavelet, respectively, 
which are assumed to be compactly supported and to have unit norm with respect to || • H2, 
the Euclidian norm on L 2 (0, 1). For any (j, k) G O one then defines (periodic) wavelets V^fc 
and (periodic) scaling functions 4>j,k by 

(Vi G [0, 1]) ^- fc (t) = V' 2 £ V> (2 J (t-m)-k), <j> j>k (t) = 2^/ 2 £ 4> (2* (t — m) — k) . 

The wavelet and the scaling coefficients of some signal u G L 2 (0, 1) are then simply the 
inner products of u with the wavelets ipj^ and the scaling functions 4>j^, respectively. We 
further write W, V: L 2 (0, 1) — > £ 2 (Q) for the mappings that take the signal u G L 2 (0, 1) 
to the inner products (Wit) (j,k) := (ipj^ju) and (Vu) (j, k) := ((t)j : k,u), respectively. 

In order to get a (biorthogonal) wavelet basis one has to impose some completeness condition 
and some connections between the wavelets and the scaling functions. Such assumptions 
are most naturally formulated in the multiresolution framework (below already adapted to 
the periodic setting). Hence, in the following we assume the existence of subspaces Vj and 
Wj of L 2 (0, 1), referred to as scaling and detail spaces, respectively, meeting the following 
requirements: 

• For every j G N, the following mappings are bijections: 

Vj R v . „ ((^. fc> u ) : € {0, . . . , 2* - 1}) , 
Wj ^ . „ (fa fcj u ):k€{0,...,2>- 1}) . 

• For every j G N, we have the multiresolution decomposition Vj = Vj-i © VVj_i. 
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• The union IJj g N Vj * s dense in I? (0, 1). 

Repeated application of the multiresolution decomposition yields the decomposition of the 
signal space into the sum of the scaling space Vo and the wavelet space W := ©jxjW/. 
Moreover, the above conditions imply that there is a stable one to one correspondence 
between any element in W and its inner product with respect to T> := (ipj : k '■ (j>k) G 
Moreover, the multiresolution decomposition serves as the basis of both, discretization and 
fast implementation. Notice that the construction of compactly supported orthogonal and 
biorthogonal wavelets is non-trivial and such systems have been constructed for the first 
time in |15t I17j. By now such wavelet systems are well known; a detailed construction of 
orthogonal and biorthogonal wavelet systems together with many interesting details may be 
found in mUglHg. 

Remark 4.1 (Biorthogonal basis). If the spaces Vj and Wj are orthogonal, then V is an 
orthonormal wavelet basis and Vj and Wj are spanned by the scaling and wavelet functions, 
respectively. However, we do not require orthogonality in the following. In this more general 
case, the scaling and wavelet spaces are spanned by certain dual systems (or biorthogonal 
bases; thus the name). Biorthogonal wavelets are often preferred to strictly orthogonal ones 
since they allow more freedom to adapt them to a particular application in mind. Especially, 
opposed to orthogonal wavelets, biorthogonal wavelets can at the same time be smooth, sym- 
metric and compactly supported. ❖ 

Remark 4.2 (Computing the wavelet transform). The multiresolution decomposition Vj = 
Vj—i © VVj— i is the basis for fast computation of the wavelet transform. Given the scaling 
coefficients at some scale L > 0, the scaling and wavelet coefficients at scale L — 1 can 
be computed by cyclic convolution of the given scaling coefficients with a certain discrete 
filter pair. Repeated application of this procedure eventually yields all scaling and all wavelet 
coefficients at scales below L. In the case of biorthogonal wavelets, the multiresolution 
decomposition can be inverted again by repeated application of convolution with a different 
pair of reconstruction filters. ♦♦♦ 

Throughout the following we assume that a discrete signal u n G W 1 is given, where the 
discretization number n = 2 L is an integer power of some maximal level of resolution. One 
then interprets the components of the discrete signal as the scaling coefficients of some 
underlying continuous domain signal, that is, 

(Vfc G {0, . . . , n - 1}) u n (k) = (Vu) (L, k) = (<j> L>k ,u) . 

Obviously there are infinitely many continuous domain signals yielding to the same scaling 
coefficients. However, according to the made assumptions, there exists a unique element in 
the scaling space Vl having scaling coefficients u n . This element will be denoted as u* n S Vl- 

The wavelet coefficients of the discrete signal are then simply defined as the wavelet coeffi- 
cients of the continuous domain signal with indices in 

Sin := {{j, k) : j E {0, . . . , J- 1} and A: G {0, . . . , 2* - 1}} . 
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According to the multiresolution decomposition, these coefficients depend only on the dis- 
crete signal and can also be written as discrete inner products 

(Vu n G M n )(V(j, k) G n n ) <^fci «n> := (fe «> • 

This serves as definition of both, the discrete wavelets G R n and the wavelet coeffi- 
cients of u n . Finally we define V n as the family of all discrete wavelets ipj k and denote 
by W n : W 1 — > Mp n the corresponding analysis operator, which we refer to as the discrete 
wavelet transform. 

Remark 4.3 (Numerical computation). The discrete wavelet transform is computed by 
repeated application of the multiresolution decomposition, yielding to all discrete wavelet 
coefficients and the scaling coefficient u\ = {(fio t o,u); see Remark Since the discrete 
filters usually have small support, the wavelet transform can be computed using only O (n) 
operation counts and the same holds true for recovering u n from those coefficients. Notice 
that the discrete wavelets are never computed explicitly in the multiresolution algorithm. 
We defined them in order to verify our general framework. Finally, we stress again that the 
wavelet coefficients of u n coincide with the one of u up to scale log(n/2). ❖ 

4.1.2 Biorthogonal Basis Denoising 

Now consider the denoising problem (jl.ip . which simple reads V n = u n + e n . In wavelet 
denoising the soft-thresholding procedure is usually only applied to coefficients above some 
scale; compare with Remark 12.11 For simplicity we shall consider the case where all 
wavelet coefficients are thresholded but not the scaling coefficient. Hence, the wavelet 
soft-thresholding estimator (for the wavelet part of u n ) is defined by 

u n = W- 1 oS(W n K,T) . 

Thanks to the multiresolution algorithm, the wavelet soft-thresholding estimator can be 
computed with only O (n) operation counts. 

We measure smoothness of the considered estimates in terms of Besov norms. To that end, 
assume that the mother wavelet has sufficiently many vanishing moments and is sufficiently 
smooth. Then, for given norm parameters p, q > 1 and given smoothness parameter r > 
one defines 

|U| = „/V2^<? \\x(j, with s = r+---. 

II Wp,q,r f II yjl >Wp 2 V 

V ieN F 

for any iff 2 (O) and with || • || p denoting the usual £ p -norm taken for fixed scale j G N. It is 
then clear that any of these norms satisfies the component- wise monotony property (|3.12p . 
We further write \\u\\ Rr := ||Wu|| for the corresponding norm of some u G L 2 (0, 1) and 

p,q Pi™)' 

finally denote by B r (0, 1) the set of all signals having finite norm ||tt]| Rr < oo. The pair 

p,q 

(B r p q (0, 1) , || • ||s£ ) is a Banach space and referred to as Besov space. The given definitions 
provide norm equivalents of || • ||fi£ to the definition of Besov norms in classical analysis 
and further do not depend on the particular choice of wavelet basis, as long as the mother 
wavelet has m> r vanishing moments and is m-times continuously differentiable. 
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Theorem 4.4 (Thresholding in wavelet bases). The discrete wavelet bases T> n = : 
(j, k) G £l n ) are asymptotically stable. In particular, the following holds: 

(a) Distribution: Let e n be a sequence of noise vectors in W 1 with independent iV(0, un- 
distributed entries. Then, the sequence ||W n e n ||oo is of Gumbel type with normalization 
constants ua(x,n), ub(x,n) defined by \1.6\) , fli.7| ]. 

(b) Confidence regions: Let (a n )neN C (0,1) be a sequence of significance levels con- 
verging to some a G [0,1) and let T[a n ,n) denote the corresponding EVTs defined 
in (EOJ) . Then, 

lim P {\\W n {u n - V n )\\oo < T(a n ,n) ; Vu n G R In } = 1 - a , 

n— >oo 

(c) Smoothness: Let denote the soft-thresholding estimator using the extreme value 
thresholds T(a ra ,n). If the considered mother wavelet has m > r vanishing moments 
and is m times continuously differentiable, then 

liminf P < \\u* \\nr < \\u\\st ;Vu € Bl J > 1 - a. 



Proof. By definition, for any n G N and pair of any indices (J, k), (j' , k'), the inner products 
(ip'jk^f k') °f ^ ne discrete wavelets coincide with the inner product (tpj t k, 4>j',k') °f the 
continuous domain wavelets. Since the family (V^fc : (j, k) G f2) is a Riesz basis with 



normalized elements this immediately yields Condition (ii) required in the Definition 11.11 for 
asymptotically stable frames. 



Condition (i) of Definition 11.11 is satisfied since indeed all magnitudes \{ipj,ki'4 , j' ,k')\ are 
bounded away from one. To see that this holds true, it is sufficient to consider the case 
where ip(2H — k) and ^(2? t — k') are both supported in the interval (0, 1) and satisfy j' < j. 
Application of the substitution rule yields 



tp(2 j t-k)ip(2 j 't-k')dt 



>(i-i')/2 



c(2 J j I k ■ 2? ' k)tj>{t)dt 



j-j',k-23-J fc> 



Because all wavelets have unit norm, the Cauchy-Schwarz inequality shows that we have 
\{i>j-j',k-2i-i'k^)\ < 1- The upper frame bound implies that Eo',fc) e oJ <>i,fc, ^)| 2 < °o, 
and hence the sequence {{'4>j ! k,' l P} '■ (j; k) G Q n ) in particular converges to zero. As a 
consequence, the numbers \{i/jj,k, i>j',k')\ are uniformly bounded by some constant p < 1. 



The other claims in Items (a) - (c) then follow from the asymptotic stability of the frames 



T> n and the general results derived the previous section. Actually, the first two items are 
just restatements of Theorems II .21 and 13. 71 adapted to the wavelet setting. For Item (c) note 
that the norms || • \\ p ,q, r satisfy the component-wise monotony property (|3.12p and therefore 
Theorem 13.101 yields 



liminf P {||x n ||p,g,r < ||W n tt n || Pi9ir ; Mu G B r p q } > 1 - a with x n := S (V n , T(a n , \tt n \)) ■ 
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By definition we have ||cc n ||p,<j,r = l|^nllsj a an< ^ the inequality || W n ti ra || p gr < HuHgr^ for all 



n which finally yields Item (c) and concludes the proof. □ 



4.1.3 Cycle Spinning 

A mayor drawback of thresholding in a wavelet basis is its missing translation invariance. 
This typically causes visually annoying Gibbs-like artifacts near discontinuities at non-dyadic 
locations. One way to significantly reduce these artifacts is via so called cycle spinning 
(see [16] ) . The idea there is to reduce the artefacts by averaging several estimates obtained 
by denoising shifted copies of the noisy data. 

Let V n = (V^ fc : (j, k) £ £1 n ) be an orthonormal wavelet basis and denote by X m : M. n y M. n 
the cyclic translation operator, defined by (T m u n )(k) = u n (k — m) for u n S W 1 and all 
k,m S {0, . . . , n — 1}. Cycle spinning then averages the wavelet soft-thresholding estimates 
of the translated data T m u n over all shifts m = 0, . . . , M — 1, where M is some prescribed 
number of considered translations. Hence, one defines 



U n ,M ■= Yl T -™ W n S (W n T m K, T) . (4.1) 

m=0 

The following elementary Lemma [4.5l states that the cycle spinning estimator (14. ip is equal to 
the soft-thresholding estimator defined by Equation (|2,6p corresponding to the overcomplete 
wavelet frame 

V nM := (T_ m Vj,fc : (j, k, m) G fi^ ) with VL nM := VL n x {0, . . . , M - 1} . (4.2) 

Hence wavelet cycle spinning fits into the general framework of soft-thresholding introduced 
in Section [2j 



Lemma 4.5. Let T> Hi m be the overcomplete wavelet frame defined in \4->ty an d denote by 
~W n ,M '■ - > M nM the corresponding analysis operator. Then, the cycle spinning estima- 
tor |^.i[ ) has the representations 

u n ,M = W* n>M o S (W n , M K, T) = W+ M o S (W n , M K, T) . (4.3) 

Hence the cycle spinning estimator equals the soft thresholding estimator corresponding to 
the redundant wavelet frame T> n ,M- 

Proof. The first identity in (I4.3D immediately follows from (14. 2D and (]4.ip . Next we ver- 
ify the second equality. Since the decimated wavelet transform W n and the translation 
operators T m are isometries, we have 

M-l 

|2 



^n,M u n\\ = ^ ||W n T m tt n || = M\\u n \ 



m=0 



whenever u n are the scaling coefficients of a member u of the wavelet space W. Hence T , n ,M 
is a tight frame with frame bound equals M. This implies that the dual synthesis operator 
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WZm corresponding to the cycle spinning frame is simply given by W+ M = W* M /M, 
which yields the second equality in (|4.3p . □ 

In the following we will show that redundant cycle spinning frame is asymptotic stable and 
thus allows the application of our general results. Strictly taken, these conditions do not 
hold for the frame T> n ^i since some of the elements T m ipj ^ occur more than once in P ni M- 
In particular, the cardinality of the set {T m '4'i,k '■ (i> k, m) E Q n ,M} is strictly less than 
l^n.Afl = nM; the exact number of different frame elements is computed in the following 
Lemma 14.61 Asymptotic stability will then be satisfied for the frame that contains every 
element T m ^^ exactly once. 

Lemma 4.6. For any M < n, the number of different elements of the frame T> nt M defined 
in is given by 

|{T m ^- k : (j, k, m) E Q n ,M}\ = n L lo g 2 M\+M (2^ ™/ M l - l) (4.4) 



Proof. The definition of the wavelet basis implies that ipj ^ = T^-j^ipjfi for very (j, k) E £l n 
and hence the periodicity of ifrj Q implies that 

TmV'^fc = T m+n2 -jfcV'jr',0 = 1pj,k+m23/n ■ 

whenever m2 J /n is an integer number. This shows that for every given scale index j E 
{0, . . . , log 2 re—l}, there are exactly min{n, M2 J } different wavelets. One concludes that 

\{T m ^j,k : {j, k, m) E n n>M }\ = n \{j : n/M < TP < n/2}\ +M ^ 2* 

M2i<n 
[Tog 2 n/Afl-l 



n (log 2 n - |"log 2 j ) + M 2 j . 



This shows Equation (|4.4p . □ 



In the following we shall for simplicity assume that M, the number of shifts in the cycle 
spin procedure, is an integer power of two. In this case, Equation (|4.4|) simplifies to 

I {T m ^, fc : (j, k, m) E Q n , M } | = n log 2 (Af) + n - M . (4.5) 

Note that this is significantly smaller (at least for large M) than the naive bound Mn given 
by the cardinality of ^l n ,M- 

Theorem 4.7 (Thresholding using cycle spinning). Let M be any fixed integer power of 
two, denote by T> n ^M = (TmVj.fc : {ji k, m) E £l n ,M) the overcomplete wavelet cycle spinning 
frame and by ~W n ,M the corresponding analysis operator. Then the following assertions hold 
true: 
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(a) Distribution: Let (e n ) ne pj be a sequence of noise vectors in W 1 with independent 
N (0, a 2 ) -distributed entries. Then, the sequence \\ ~W n ,M £n\\oc is of Gumbel type with 
normalization constants aa{x-, n ) (defined by II 1-6]) ) and abM (x n ) > where 

u ( \ ATi , -loglogn-log7r + 21oglog 2 (M) 

2v21ogn 

(b) Confidence regions: Let a n £ (0,1) be a sequence of significance levels converging 
to some a G [0,1) and let T(a n ,\T> n ,M\) denote the corresponding EVTs defined in 
HUH). Then, 



lim P{\\W njM {u n -V n )\\ oa <T(a n ,\V n> M\)^u n eR In } = 1 



a . 



(Here by some abuse of notation Af| denotes the number of different elements in 
that frame.) The same holds true if we replace T(a n , |£V,m|) by 

T M (a n , n) := -aa (x, n) log log (l/(l - a n )) + obu (x, n ) ■ (4.6) 

(c) Smoothness: Let n* M denote the soft-thresholding estimator using either T(a n , \T> n \) 
° r T n ^M(ct n ) as threshold. If the considered mother wavelet has m > r vanishing mo- 
ments and is m times continuously differentiable, then 



liminf P < \\u* m\\b t < \\ u \\t3 r ;VnG^!j„>>l 



a . 



Proof. By using the characterization of the cycle spinning estimator in Lemma 14.51 and the 
cardinality computed in Lemma [4,61 the proof follows the lines of the proof of Theorem 14.41 
Again, one simply uses the fact that the discrete inner products coincide with continuous 
ones of functions forming an infinite dimensional frame. However, notice the change of the 



normalization sequences in Item (a) which is also used for the threshold T nj 7v/(a n )- As easy 



to verify we have the asymptotic relation 

2z — log log n — log 7r + 2 log log 2 (M) 



a (x, n)z + b M (x, n) = a/2 log n + 



2 v / 2bgn 



; n 



= a(x, \T> n ,hi\) z + b (x, \Dn,M\) + (l/V 21 °g' 
From basic extreme value theory it follows that we can replace the sequence a(x, \T > n,M\) z ~ s< ~ 



b(x, \T>n,M\) by the one considered in Item (a) and for the threshold T ny M{a n )- D 



Remark 4.8. This alternative form for the EVTs for cycle spinning denoising has 

been introduced to allow a better comparison with the EVTs used in the basis case. It fact, 
it can be seen that the the extreme value thresholds T/v/(a n ,n) for the redundant wavelet 
frame T> n ^M simply increase by the additive constant (log log 2 M) / \/2 logn when compared 
to extreme value threshold T(a n ,n) for the non-redundant wavelet frame. ♦♦♦ 

The sharp results in Theorem 14.71 require M to be a fixed number. In the fully translation 
invariant transform, to be discussed next, one takes M = n dependent on the discretization 
level. This effects a strong dependence of large scale coefficients and that the distributional 



limit of Item (a) in Theorem 14.71 will not longer hold true. 
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4.1.4 Fully Translation Invariant Denoising 



Translation invariant wavelet denoising introduced in |16t l4"2j |4"7] 09] is similar to cycle 
spinning denoising. However, now one takes the whole range of M = n integer shifts 
instead of taking it as a fixed number independent on n. That is, the translation invariant 
wavelet estimator is defined by 

^ n— 1 

u n ,n ---Y, T_ m W; o S (W n T m K, T) . (4.7) 

m=0 



Lemma 14.51 implies that u n ^ n equals the soft-thresholding estimator W+ n S(W„ )n l / ra , T\ 
where W njn denotes the analysis operator corresponding to the translation invariant wavelet 
frame 

T> n ,n = (T m Vj" fc : U, k) G £l n and m G {0, . . . , n - 1}) . 

Equation (I4.5P shows that the translation invariant wavelet frame contains n log 2 n different 
elements. Further, from the proof of this Lemma it follows that T> n ^ n is a tight frame with 
frame bound equals n. After removing multiple elements in D n ^ n , the resulting frame is 
non-tight but still has upper frame bounds B n = n tending to infinity as n — > oo. One 



concludes that Condition (ii) fails to hold for the translation invariant wavelet transform. 
The increasing frame bounds somehow reflect the increasing redundancy and dependency 
of the coarse scale wavelets with increasing n. 

One might conjecture that still the distribution result of Theorem 14.71 holds true with M 
replaced by n. However, we shall show that this is not the case. Intuitively, the increasing 
correlation of the coarse scale wavelets with increasing n causes the maximum || Wn.^enHoo to 
be in probability smaller than the maximum of n log 2 n independent coefficients. Although 
the confidence regions in Theorem 14.71 still hold (as follows from Sidak's Lemma lA.9p . the 
confidence regions are no longer sharp and the considered radii are unnecessarily large. The 
following theorem gives a much smaller radius for these confidence regions; in particular 
this significantly improves [31 Theorem 4.4]. For simplicity we only consider continuously 
differentiable mother wavelets; similar statements could in fact be derived for any square 
integrable wavelet. 

Theorem 4.9 (Translation invariant thresholding). Assume that the mother wavelet ip is 
continuously differentiable, which implies that (ip * ip)(t) = 1 — c 2 t 2 /2 + o(t 2 ) for some 
constant c. (Here * denotes the circular convolution and tjj (s) := ip (— s).) Further denote 
by W n n the corresponding discrete translation invariant wavelet transform. 

Then, the following assertions hold true: 

(a) Distribution: Let (e n ) ne N be a sequence of noise vectors in M. In with independent 
N (0, <7 2 ) -distributed entries. Then, for every z£R, 

liminfp(||W n , n e„|l < y^i^ + ili^M! > eX p {-e~ z ) . (4.8) 

rwoo [" 1100 a/2 log n J 
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(b) Confidence regions: Let a n G (0,1) be a sequence converging to some a G [0,1). 
Then, we have 

lim inf P { || W n , n (un - V n ) || <T n (a n ) ;Vu„, GR 7 "} > 1-a, 

n— »oo L 11 v /moo v / J 

w/ten using the thresholds T n (a n ) := (7-^2 log n + a (2 log n)~ 1/2 loglog(l/(l — a n )) + 
log (c/vr). 

(cj Smoothness: Let u^n denote the soft-thresholding estimator using the threshold 
T n (a n ) defined in Item \ (b)\ If the considered mother wavelet has m > r vanishing 
moments and is m times continuously differentiate, then 



liminf P \ \\u* n llgr < HuIIst ;VtiGB! J 



> 1 - a. 



Proof. The key to all results is the distribution bound given Item (a) Its proof is somehow 
technical and is presented in Appendix lB.2l The other claims follow from Item (a) combined 
with the results of the previous sections (namely Theorems 11.21 [37T1 and [3~1~0|) . and are verified 
as the corresponding statements in the proof of Theorem 14.41 □ 

Remark 4.10. Consider a sequence of standardized normal vectors rj n each of them hav- 
ing Mn (log nf independent entries, where M is some fixed integer and r > some fixed 
nonnegative number. From Proposition \A.5\ we know that ||?7n||oo 

is of Gumbel type with 
normalization sequences a (x, Mn (log n) r ) and b(x,Mn (logn) r ). One easily verifies that 

a (x, Mn (log n) r ) z + b(x, Mn (log n) r ) 

Pn i (-V2 + r)loglogn + log(M/V?) / _ n 

= a/2 log n + i '- + o 1/ y/2 log n . (4.9) 

V2 log n V / 



T/iis allows to compare the bound in h4-8\ ) with the asymptotic distribution of a certain num- 
ber of independent random variables. Indeed, comparing f^.ffp with we can conclude, 
that W„ )n 6„ is smaller in probability than the maximum of Mn \/logn independent normally 
distributed random variables with M := [cy/7r\. Hence /j[4.8\ ) improves the primitive bound 
obtained from the distribution ofnlogn independent coefficients by a factor \/log n/c. *♦♦ 

Remark 4.11. It is a difficult task to compute the asymptotic distribution of the trans- 
lation invariant wavelet coefficients exactly. This is due to the fact that for coarse scales 
the coefficients get increasingly correlated, whereas on the fine scales the correlations re- 
main bounded away from a 2 . No appropriate tools for asymptotic extreme value analysis of 
such mixed type random fields seem to exist. Nevertheless, we believe that the the maxima 
of the translation invariant wavelet coefficients are of Gumbel type but with even smaller 
normalization constants than the ones used in ft4-8\ ). In particular, it may even turn out 
that the threshold a\/2 log n (known from the basis case) has the denoising property for the 
translation invariant system. ♦♦♦ 
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4.2 Curvelet Thresholding 

Second generation curvelets (introduced in [9], [10], [TTJ ) are functions ipj t e t k m ^ 2 depending 
on a scale index j G N, an orientation parameter i G {0, . . . ,4 • 2^V 2 1 — 1} and a location 
parameter A: E Z 2 . They are known to provide an almost optimal sparse approximation of 
piecewise C 2 functions with piecewise C 2 boundaries (as shown in [9]); this class of functions 
usually serves as accurate cartoon model for natural images. The main curvelet property 
yielding this approximation result is the increasing anisotropy at finer scales. This feature 
also distinguishes them from standard wavelets in higher dimension. 

Remark 4.12 (Shearlets and contourlets). There exists other related function systems with 
similar properties. The cone adapted shearlet frame (introduced in f28[ [Wj \4Tj ) is very 
similar to the curvelet frame and shares its optimality when approximating piecewise C 2 
images with piecewise C 2 boundaries, see [29]. Yet another closely related function system 
are the contourlets introduced by Do and Vetterli \2(K \21^ . For simplicity we focus on the 
curvelets; similar statements could be made for the shearlet and contourlet frames. ♦♦♦ 

4.2.1 Discrete Curvelet Frames 

The discrete curvelet transform computes inner products of u G M™*™ with discrete curvelets 
V^fc ^ K™ xr \ As for the wavelet transform, the elements tjijtk are n °t computed explic- 
itly and defined implicitly by the transform algorithm. Different implementations of the 
continuous curvelet transform give rise to different discrete frame elements i/jj'ek- ^11 cur " 
rent implementations of the curvelet transform are computed in the Fourier domain. Below 
we shall focus on the wrapping based implementation of the curvelet transform introduced 
in [8|. This transform is an isometry which makes the computation of its pseudo- inverse 
particularly simple. 

Let n = 2 J be an integer power of two with J denoting the maximal scale index. The discrete 
curvelets and the discrete curvelet transform are composed of the following ingredients: 

• First, define A n as the set of all pairs (J, £) satisfying j G {0, . . . , log 2 n — 2} and 
£ G {0, . . . , 4 • 2^/21 — 1}. The index sets of the discrete curvelets is defined by 

n n := ((j,£,k) : (j,£) G A n and k G . 

where Dj £ = {0, . . . , Kjgi—1} x {0, . . . , Kj^. 2 —1} for certain given numbers Kj^.\ ~ 2 J 
and ~ 2 J / 2 . One refers to j as scale index, £ as the orientation index, and k the 

location index. 

• Next, one constructs smooth nonnegative window functions Wji- M 2 — )• M satisfying 
the identity 

J-2 4-2^/21 _i 

(vzgm 2 ) E Kl(*)| 2 = l. 

j=0 £=0 

The functions Wj£ are essentially obtained by anisotropic scaling and shearing a single 
window function; see [8] for a detailed construction. 
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• For any index triple (j,£,k) G £l n the discrete curvelet at scale j, having orientation 
£/2^V 2 l and location k = (ki/K, fe/^j ^2) is defined by its Fourier representation 

(F„<„ fc ) (m) = Wj ' e (m) e-^imik./K^-m^/K^) _ (4 1Q) 

Here the coefficients Cj^ are chosen in such a way that HV^fcll = 1 and F n denotes 
the discrete Fourier transform. 

• Finally, one defines the curvelet frame V n = (ift'jg k ■ (j, £, k) G Q n ) and denotes by 
C„, : M nxn — > Mp" the corresponding analysis operator, which has been named digital 
curvelet transform via wrapping in [5]. In the following we will refer to C n simply 
as discrete curvelet transform. We emphasize again that the implementation of the 
discrete curvelet transform does not require to compute the curvelets tp™ £ k explicitly. 

Implementations of the discrete curvelet transform and its pseudoinverse using C(n 2 logn) 
operation counts are freely available at http : / / curvele t ,org[ Although this implemen- 
tation does not use normalized frame elements, the constants Cj g can easily be computed 
after the actual curvelet transformation and applied for normalizing the curvelet coefficients 
prior to denoising. The denoising demo f dct_wrapping_demo_denoise .m included in the 
curvelet software package in fact computes the norms of the discrete curvelets and uses them 
for proper scaling of the chosen thresholds. 



4.2.2 Curvelet Denoising 

We now consider our denoising problem (jl.ip . which, after taking the discrete curvelet 
transform, simply reads Y n = x n + C n e n . As usual, the estimator we consider is soft 
thresholding x n = S (Y n , T) of the curvelet coefficients. 

Similar to the wavelet case, we measure smoothness in terms of the weighted (p, g)-norms, 
depending on certain norm parameters p, q > 1 and a parameter r > describing the degree 
of smoothness. More precisely, we define 



\p,q,r 




2^i\\x n {j,£,-)\\ q p with s = r + § (I - ^) 



These types of norms applied to the continuous domain curvelet coefficients have been 
defined and studied in [3]. In that paper also relations between these norms and classical 
Besov norms have been derived. 

Theorem 4.13 (Curvelet thresholding). The discrete curvelet frames T> n = (^"^^ : (j,£,k) G 
f2 n ) defined by Equation ^4-10 ) are asymptotically stable. In particular, the following asser- 
tions hold true: 
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(a) Distribution: Lete n be a sequence of noise vectors inM 7 ™ with independent N (0, a 2 ) - 
distributed entries. Then, for every z£R, 



lim P < ||C n e n 



2z — log log IfL 



log 7T 



2 v / 21og|^ n | 



exp (— e z ) 



f&) Confidence regions: Let a n G (0,1) 5e a sequence of significance levels converging 
to some a G [0, 1) and Ze£ T(a n , |O n |) denote the corresponding EVTs defined in 113.9)) . 
Then, 

lim P{||C n K-K)IL <r(a n ,|fi n |) ;«„eM'"} = l-o, 

(c,) Smoothness: Let x n denote the soft thresholding estimate using the extreme value 
threshold T(a n , |fJ n |). Then, with any of the norms defined above, we have 



lim inf P { 



x 



n\\p,q 



,r — || ^n^n g,r 5 G M j- ^ 1 



Proof. All frame elements are normalized due to the chosen scaling. Moreover, as shown 
in [SI Proposition 6.1], the discrete curvelet frame T> n is faithful to an underlying infinite 
dimensional curvelet frame obtained by periodizing the curvelets on the continuous domain 



M 2 . This immediately yields Condition (ii) Moreover, along the lines of |10| (which uses a 
slightly different curvelet system) one easily shows that the inner products satisfy 

for some constant p < 1 independent on n and all indices. This obviously implies Condi- 
tion (i) All claims in Items (a) (c) then follow from Theorems 11.21 13.71 and 13.101 derived in 
the previous section. □ 
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A Statistical Extreme Value Theory 

Let (O n ) ng N be a sequence of finite index sets with monotonically increasing cardinalities 
|fi n | satisfying liim^oo |fl n | = oo. Moreover, for every n G N, let £ n := : w £ ^n) 

be given standardized normal random vectors, which means that CnW) ~ AT (0,1) for every 
n G N and uj G f2 n . We are mainly interested in random vectors with dependent entries, in 
which case Cov(£ n (a;) , £ n (a/)) ^ for at least some pairs (oj,oj') G O 2 with u ^ uj'. 
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As the main result of this section we derive the asymptotic distribution of ||£ n ||oo f° r a 
sequence £ n of dependent normal vectors whose covariances satisfy a certain summability 
condition (see Theorem IA.8p . Whereas similar results are known for max (£n)> to the best 
of our knowledge, such kind of results are new for ||£ n ||oo- Since |£ n (cj)| 2 ~ \ 2 is chi-squared 
distributed with one degree of freedom, our results can also be interpreted as new results 
for the asymptotic extreme value theory of dependent x 2_ distributed random vectors. 

A.l Maxima of Normal Vectors 

We will start by reviewing and slightly refining the main results from statistical extreme 
value theory for maxima of normal vectors as we require them in this paper. 

The most basic extreme value result deals with the case where the components of £ n are 
independent. In this case it is well known, that, after rescaling, max f£ n ) converges to the 
Gumbel distribution as n — > oo. 

Proposition A.l. Let (£n) ngN be a sequence of standardized normal random vectors mR^ n 
with independent entries. Then the maxima max (£ n ) are of Gumbel type (see Definition \1.3\) 
with normalization sequences a(\{l n \,n), b(\{l n \,n) defined by \3. 3\) . 

Proof. See, [H2 Theorem 1.5.3]. □ 

If the entries of £ n are dependent, then the result of Proposition IA.1I does not neces- 
sarily hold true. There is, however, a simple and sufficient criterion on the covariances 
Cov (^ n (o;), of a sequence of dependent normal vectors such that the maxima still 
are of Gumbel type with the same normalization sequences. This criterion is an immediate 
consequence of the so called normal comparison Lemma or Berman's inequality (see [431 
Theorem 4.2.1]). For later purpose, where we study ||£ n ||oo instead of max (£n)> we require 
a quite recent improvement of this important inequality which is due to Li and Shao |44j . 
The standard form of the normal comparison Lemma [43, Theorem 4.2.1] has already been 
applied for redundant wavelet systems in [3l HH], which however, only yields results for 
maxima of £„, without taking absolute values. We stress again, that taking absolute values 
slightly change the constants in contrast to relations (jl.3p and ([L 



Lemma A. 2. Let r] n , be standardized normal random vectors in Mp n , denote its co- 
variances by K Vn (u,u') := Cov (rj n (u}), r/ n (ui')) , K^ n (oj,oj') := Cov (£ n (u) , £n(fc/)) , and set 
p n (uj,uj') := max{\n Vn (uj,uj')\, \K^ n (u,co')\} . Then, for all T n el, 

P {max (rj n ) < T n } - P {max < T n } < 

(arcsin(^„(w,w')) - arcsin (^ n (w,a;'))) + exp ( 1 + p J^,) ) • 

Here z + = max{z, 0} denotes the positive part of some real number z E R and arcsin denotes 
the inverse mapping of sin: [— 7r/2,7r/2] — > [—1,1]. 
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Proof. See [SJ Theorem 2.1]. □ 

In the special case where r\ n has independent entries, Lemma rA.2l has the following immediate 
consequence given in Lemma [A. 3 1 This allows to extend Proposition lA.il to certain sequences 
of dependent random vectors by comparing them with independent ones. 

Lemma A. 3. Let r] n ,^ n be standardized normal random vectors in M. n ™. Assume that the 
entries of rj n are independent, and let K n denote the covariance matrix of £ n defined by 
Kn(u},u)') := Cov (£ n (uj) ,£ n (u/)). Then, for all T n £ R, 

|P {max (r/n) < T n } - P {max (£ n ) < T n }\ 

sl£KMN(- 1 + ^l )- ("J 

Proof. See O Corollary 2.2]. □ 

Lemmas IA.2I and IA.3I are significant improvements of the standard versions of the normal 
comparison Lemma [I3J Section 4] due to the absence of a singular factor (l—\p n (uj, w')| 2 ) 
that is contained in earlier versions. It is in fact the absence of this singular term that we 
require for deriving an inequality similar to the one in Lemma IA.3I that compares the dis- 
tributions of Hindoo and || £n 1 1 co for two normal vectors r\ n and £ n . 

Theorem A. 4. Let (£n) ngN be a sequence of standardized normal random vectors in 
having covariance matrices K n G R n ™ xn ™ satisfying 



Then, the maxima max (£ n J are of Gumbel type (see Definition 11,3]) with normalization 
constants a(N,\CL n \), b(N, \Sl n \) defined by / TO)) . (34\ ). 

Proof. Fix some z € M and define T n := a(N, \fl n \)z+b(N, \ fl n \). Then, the definitions of the 
normalization sequences a(N, \£l n \) and b(N, |O n |) imply that T 2 = 21og |fi n | —log log |O n | + 
O (1) as n — > oo. Hence there is some constant C > and some index uq 6 N, such that for 
all n > no, we have 

( Tl \ ( log(|^| 2 /log|^|) ^ (\og\n n \^ l+ ^^ 
ex P — i n 7 7TT <Cexp[ — — — = C" 



l + \K n ((jJ,U}')\J y l + \Kn(u),u)')\ J V |^n| 2 

Now let Equation (|A.2|) be satisfied and let (r] n ) n ^fq be a sequence of standardized nor- 
mal vectors with independent entries. Then, the triangle inequality, Lemma lA.3[ and the 
estimate just established imply 



in I 2 



l/(l+|re„(u;,u/)]) 



|P{max(£ n ) <T n } \ < |P{max(?7 n ) <T n }\+— ^ |K n (w,a/) 
Hence the claim follows from Proposition IA.1I and Assumption (|A.2|h □ 
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A. 2 Maxima of Absolute Values 



In the following we derive results similar to Proposition I A . 1 1 and Theorem IA.4I for ||£ n ||oo m 
place of max (£ n ). The first auxiliary result, Proposition IA.51 deals with the independent 
case. It is easy to establish but nevertheless seems to be much less known than the corre- 
sponding result in the normal case. We include a short proof based on the known extreme 
value distribution of independent x 2_ distributed random variables. The second and main 
result in this section, Theorem I A. 8\ deals with the dependent case. It is a new contribution 
and based on a novel inequality for comparing the distributions of Halloo and ||£n||oo (given 
in Lemma |A,7|) , 

Proposition A. 5. Let (£n) ngN be a sequence of standardized normal vectors inMp n having 
independent entries. Then Halloo is of Gumbel type (see Definition \1.3\) with normalization 
sequences a(x,\Q n \), b(x,\Q n \) defined by hi. 0(1 , {1.70 . 

Proof. Since £n(u;) is standard normally distributed for any u> £ £l n , the random variables 
|£ n (u;)| 2 are x 2_ distributed with one degree of freedom. The x 2 -distribution is in turn a 
member of the family of Gamma distributions Fp ^ corresponding to j3 = 7 = 1/2. The 
asymptotic extreme value distribution of the Gamma distribution Fp ~ is known (see [26} 
page 156]) and implies 

lim P{||£n||L < 2z + 21og|Q„| -loglog|O n | -log^} = exp (-e~ z ) . (A.3) 

n— >oo 

Moreover, a Taylor series approximation shows 

yjlz + 21og |f2 n | - log log |fy - log7T 

= a(x, \£l n \)z + b(x, |fi n |) +o(a(x, \^n\)) as n -> 00 . (A.4) 

Any o(a(x, |J) n |))-term can be omitted when computing extreme value distributions (see 
[4~3"1 Theorem 1.2.3]), and hence Equations (1A.3P and ()A.4j) imply the desired result. □ 

Remark A. 6. The sequence b(y, \Q n \) used for normalizing the maximum ||Cn||oo ^ n Propo- 
sition lATEi is different from the sequence b(N, \£l n \) used for the normalization of max (£ n ) 
in Proposition \A.ll Indeed, as easily verified, 

b(N, 2 |0 n |) = b( X , |0 n |) + o {a{N, 2 |0„|)) . 

Again, the o(a(A r , 2 |fi n |)) term can be omitted in the extreme value distribution and hence 
1 1 in 1 1 00 behaves equal to the maximum o/2|f2 n | (opposed to \£l n \) independent standard nor- 
mally distributed random variables. Using different arguments, this has already been observed 
in Section 8.3]. ❖ 

If the entries of £ n are not independent, then the result of Proposition IA.5I does not nec- 
essarily hold true. If, however, the correlations of £ n are sufficiently small, then, as in the 
normal case, we will show that the same Gumbel law still holds. This result follows again 
from a comparison inequality, now between the distributions of ||£n||oo and Halloo with 
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some reference normal vector r] n , to be derived in the following Lemma I A. 71 For the sake of 
simplicity we assume that the vector r) n has independent entries; in an analogous manner a 
similar result could be derived for comparing two dependent random vectors. 

Lemma A. 7. Let r] n ,£ n be standardized normal random vectors in MP 1 ™ . Assume that the 
entries of rj n are independent and denote by K n E R^« x ^n the covariance matrix of £ n , 
having entries K n (u, uj') := Cov (£ n (uj) , (a/)) . Then, for all T n E K, 

\p{\\v n \\oo<T n }--p{u n \\ 00 <T n }\<jY t \Kn(oj,oj')\ expf- r ; 2 — m) ■ ( A - 5 ) 

4 y 1 -f- \K n [UJ, UJ )\ J 

Proof. The proof uses the normal comparison Lemma IA.2I of Li and Shao applied to the 
strongly dependent random vectors Y n := (r] n , — %) and X n := f£ n , — £ n ) in place of rj n and 
£ n . To that end, we first note that obviously {|£ n | < r n } = {X n < T n } and {|r/ n | < T n } = 
{Yn < T n }. Moreover, the covariance matrices of Y n and X n are block matrices of the form 

Cov(Y n )=(_] n ~] n ) and Cov(X„) = ( _ Kn ~ Kr 

where n n = Cov f£ n ) denotes the covciricincG nicitrix of ^ n and. \ n — Cov (Y n j is the identity 
matrix in ]g>^™ x ^™. Now applying Lemma IA.2I with Y n and X n in place of r\ n and £ n yields 

P{Woo<rn}-P{||Cn|loo<rn} 

= P {max (r? n , < T n } - P {max (£ n , -£ n ) < T„} 

ji2 



1 ( T 

<— V ((-arcsin(K ri (a;,a; / )))+ + (arcsin(/€ n (w, ex P ( _ TT1 — ( i\\ 

2TT \ 1 + \K n {UJ,Uj')\ 

= — |arcsin(— K n (uj, oj'))\ exp ( ; — 7 — - ) . 

2vr 1 M JM P V 1 + |k„(w,w')I/ 



Here for the first estimate we used that the two sums over the diagonal blocks give the same 
value, that the same is the case for the two off-diagonal blocks, that all terms having uj = uj' 
cancel and that Cov(X n ) (uj,uj') = for uj ^ uj' . Interchanging the roles of £ n and rj n yields 
the same estimate for P{||£ n ||oo < T n \ — P{||f/n||oo < T n \ and hence implies 

|P{||r?n|| 00 <T n }-P{||^|| 00 <T4| 

1 / T 2 \ 
- ^~ |arcsin(K„(o;,a;'))| exp -— - — 2 . (A.6) 

27T 1 V 1 + \K n {0J,Uj')\J 

Finally, the estimate |arcsinf| < \v\ ■ ir/2 for v E [—1,1] and inequality (|A.6|) imply the 
claimed inequality (|A.5[) . □ 

The following theorem is the main result of this section and the key for most results estab- 
lished in this paper. 
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Theorem A. 8. Let (£ n ) ngN be a sequence of standardized normal vectors in Mp n having 
covariance matrices K n G E " x " satisfying Equation /(A.fy) . Then Halloo is of Gumbel type 
(see Definition \1.3\) with normalization constants a(x, \&n\)> b(x,\Q n \) defined by 111.6]) . 

Proof. This is analogous to the proof of Theorem IA.41 Instead of Proposition IA.1I and 
Lemma lA.3l one now uses Proposition IA.5I and Lemma IA.7I □ 

Equation (|A.2h provides a sufficient condition for the extreme value results of Theorems IA.4I 
and IA.8I to hold. However, given a sequence (£n.) ngN of normal vectors with covariance 
matrices K n , it is not completely obvious whether or not (|A.2p is satisfied. In Section [3] we 
verified that (|A.2j) indeed holds in the case where £ n = (0™, e n ) are coefficients of standard- 
ized normal random vectors e n having independent entries with respect to an asymptotically 
stable family of frames : u E Q n ) . 

Occasionally we will make use of the following classical result due to Sidak [51] for bounding 
the maximum of the magnitudes of dependent random vectors by the maximum of the 
magnitudes of independent ones. 

Lemma A. 9 (Sidak's Inequality). Let rj n , £ n be standardized normal random vectors in M^ 2 ' 1 
and assume that the entries of r\ n are independent. Then, 

(VT G R) P{||Uoo < T} > P {felL < T} . (A.7) 

Proof. See [51 1. Corollary 1]. □ 

Note that a similar result also holds for the maxima without the absolute values, which 
bounds the probability P{max(£„) < T} of dependent standardized normal vectors from 
below by the probability P{max(?7 n ) < T} of independent ones. This one-sided estimate, 
however, requires the covariances of £ n being nonnegative. It is known as Slepian's Lemma 
and has first been derived in [54] . Interestingly, Slepian's Lemma immediately follows from 
the normal comparison Lemma IA.2} whereas this seems not to be the case for Sidak's two 
sided inequality. 

B Remaining Proofs 

B.l Proof of Proposition I3T31 

As already noted in [45j page 558], for the universal thresholds o-\j2 log \Q n \ this result 
easily follows by adapting the original proof of [23] (see also [Ml Section 8.3] and [431 
Theorem 11.7]) from the orthonormal case to the frame case. Indeed, as shown below a 
similar proof can be made for the extreme value thresholds T n = T(a n , |0 n |) defined by 
Equation $T9\) . 
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After rescaling we may assume without loss of generality that a = 1. Recall that the dual 
frame (<?i>™ : uj G Sl n ) has upper frame bound \/A n , that Q^^ri = Id is the identity on R /n , 
and that & n &n = -fRan(*„) equals the orthogonal projection onto the range Ran($ n ) C M n 
of the analysis operator <& n : — > Wp n . Moreover, we define the parameter x n = <& n u n 
and the data Y n = & n V n as in (12, 4p . Then we can estimate 



E (\\u n - #+ o S {& n V n ,T(a n , \Q n \))\ 

= E (||*+x n - *+ o S (<f> n V n ,T(a n , |0„|))| 
= E (||*+* n *+ (x n - S (Y n ,T(a n , \n n \))) 



<J-E(||p Raa($n) (^-s(y n ,r(a n ,|fi n |)))|| 2 

= 7- H E(|x n (u;)-S(y n (a;),r(a n ,|O n |))| 2 ) . 

Now we can proceed similar to |23| (see also |34| 05] ) to estimate the mean square errors 
E (jx n (uj) — S(Y n (Lu),T(a n , \Q n \))\ 2 ) of one-dimensional soft thresholding. 

To that end we use the risk estimate of [341 Section 2.7] for one-dimensional soft thresholding, 
which states the following: If y ~ N (^u, 1) is a normal random variable with mean /i£R 
and unit variance, then 

(VT>0) E(\fi- S(y,T)\ 2 ) < e"^ 2 + min {l + T 2 , ,u 2 } . (B.l) 



For our purpose we apply the risk estimate (|B.1|) with threshold T = T(a n ,|f2 n |). The 
definition of the threshold T(a n , \ Q n \) in (|3.9p immediately yields the estimate 

T(a n , \tt n \) 2 10 1 1 1 Ci /n \A log log |fi n | + log vr 
- '— > log \tt n \ - log log (1/(1 - a n )) - . 

Inserting these estimates in (|B.1|) applied with the random variables y = Y n {uj) having mean 
values [i = x n (u) and using the assumption T(a n , |f2 n |) < ^2 log |f2 n | yields 

E (|x„H - S(Y n H,T(a n , |ft n |))| 



log (1/(1 - a n )) yjn log \SQ . , 2 -. 

< r^-p h(l + 21og|S2 n |)min{l,|x n (w)| z } . 



Finally, summing over all cj E f2 n shows (|3.15p . 



B.2 Proof of Theorem WM 

Let rj = (t](t) : t € [0,1]) denote a white noise process on [0,1] and consider the periodic 
continuous domain wavelets ipj t b(t) = 2^ 2 %l){2^ (t — 6)). We then define the random vectors 
X n as inner products 

(Vj = 0, . . . , logn - 1)(W = 0, . . . , 2 j n - 1) X n {j, £) := <^/ n . v) ■ 
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Hence the random variables X n (j, £) are coefficients of the white noise process r\ with respect 
to a discrete wavelet transform, that is oversampled by factor n at every scale. Comparing 
this with the definition of the translation invariant wavelet transform we see that the trans- 
lation invariant wavelet coefficients W„ )n e„ can be embedded into the vector X n . Hence we 
have 

(VT>0) PQWn^enW^KTj^PqXj^ ^T} . (B.2) 

We proceed by computing the correlations of X n (j,£) for some fixed scale index. Since r\ is 
a white noise process, the definition of X n and some elementary manipulations shows that, 
for all j G {0, . . . , log n — 1} and all indices £, £' G {0, . . . , 2 J n — 1}, we have 



Cov(X n {j,£),X n (j,£')) = (V^,W'} = 2 j / V(2 j i - £/n)^{2H - £'/n)dt 

Jr 

= V [ ^{2h)^{2H- {£' - £)/n)dt= j ^(-t)^{{£- l')/n-t)dt = $ l')/n) 



Next we construct a random vector Y n with the same index set and pointwise smaller 
correlations. To that end, for every given j, we group the index set {0, . . . 2 J n — 1} into 2 3 
blocks Bj^k = {kn, . . . , (k + 1) n — 1} for any k £ {0, . . . , 2 J — 1}. We denote k := ifi * tp and 
define the matrix 



k ((£ - £')/n) if j = j' and (£, £') G \J k B j>k x B j>k 
otherwise . 



Hence we have R n ((j, £),(f, £')) = Cov (X n (j,£), X n (j',tf )) if j = j' and the indices £,£' 
are in the same block Bj^, and the correlations of R n are zero otherwise. Moreover R n is 
obviously symmetric and positive semidefinite and hence there exists a standardized normal 
random vector Y n whose covariance matrix is given by R n . By construction of R n , the 
covariances \Cov(X n (j, £), X n (j, £'))\ pointwise dominate the covariances \R n ((j,£),(j' ,£'))\. 
Hence, Lemma IA.9( which we have already used several times, implies that 

(VT>0) P{||X n || 00 <r}>P{||y n || 00 <T}. (B.3) 

Inspecting Equations (|B.2p and (jB.3|) shows that it remains to compute the asymptotic 
distribution of 1 1 Y n \ 1^. 

To that end recall that Cov (X n (j, £), X n (J, £')) =«((£ — £')/n) are densely sampled values 
of the autocorrelation function of the mother wavelet. This in particular implies that any 
block in Y n has the same distribution. Moreover, due to the independence of the blocks this 
yields 

P{\\Y n \\ oo <T} = P{m & x\Y n (0,£):£ = 0,...,n-l\ <T} n 

= (1 -P{max|y n (0,^) :£ = 0,...,n-l\ > T}) n 
= (l-P{max|(W/n,??)l :£ = 0,...,n-l} > T) n 
= (1 -P{max|X(^/n)| : £ = 0, . . . , n - 1} > T) n . 
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Here X = {X(t) : t G [0, 1]} is defined by X(t) := {ipoj,'*])- One easily verifies that X is a 
mean square differentiable normal process having covariance function n(t). Moreover the 
vector Y n (Q,£) = X(£/n) consist of n equidistant values of that process inside the unit 
interval. Hence for any sequence of thresholds T n that tends to infinity as n — > oo in a 
sufficiently slowly manner, one has the asymptotic relations (which follow from standard 
result of continuous extreme value theory [43] ) 

P{max{|X(£/n)| : £ = 0, . . . ,n - 1} > T n } ~ P {max{|J5f (t)| :iG [0,1]} > T n } 
P{max{\X(t)\ : t G [0,1]} > T n } ~ 2P{max{X(t) : t G [0,1]} > T n } 
P {max{X(t) : t G [0, 1]} > T„} ~ c/(2tt) exp (-T n 2 /2) . 

Now fix any z G K and define the sequence T n := (2(logn + z + 21og(c/vr))) 1/2 . Then 
the definition of T n immediately yields exp ( — = it/ (cn) exp(— z). Consequently, by 

collecting the above estimates, we have 

z \ n 

1 - — ) = exp (-e- z ) . 
n I 

Finally, a simple Taylor series approximation of the square root shows the asymptotic rela- 
tion 

T n = + X + l;f {CM + o (l/y/2^) ■ 

V 2 log n V / 

Recalling, for the last time, that o(l/a n ) terms can be omitted in the rescaling of extreme 
value distributions finally shows 

p{iy.ii.<^n: + i±^}.-p(-.^, 

and concludes the proof. 
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